The Lethal Trifecta is hiding in your AI agents — find it in 60 seconds

Every team shipping AI agents is quietly accumulating the same risk, and most don't have a name for it yet. Security researcher Simon Willison does: the Lethal Trifecta. An agent becomes genuinely dangerous the moment it can do all three of these at once:

Read untrusted content — a web page, an email, a PDF from a prospect, a GitHub issue.
Access sensitive data — secrets, customer records, a payments key, production config.
Communicate externally / change state — send an email, post, write to a repo, make a payment.

All three at once = an attacker's sentence can read your secrets and send them out.

With all three, a single attacker-controlled sentence buried in that web page can instruct the agent to read your secrets and exfiltrate them — and the model will often comply. This isn't hypothetical: it's how the Chevrolet chatbot sold a car for $1, how Brave demonstrated Comet's assistant leaking a Gmail code, and what The Economist meant by “why AI systems may never be secure.”

You can't prompt-engineer your way out

The instinct is to add a smarter filter — a guard model, spotlighting, a classifier. The research is brutal on this: in “The Attacker Moves Second,” teams from OpenAI, Anthropic, and Google DeepMind showed adaptive attacks bypassing essentially every AI-based defense at 80–100%. You cannot secure AI with more AI — the attacker always gets the last move.

What does hold up is architectural, not probabilistic: remove a capability and no prompt can re-add it. Meta calls this the Rule of Two— an agent may safely hold at most two of the three legs. That guarantee is deterministic, and it's the foundation ActPass is built on.

The problem: nobody can see the trifecta

The trifecta is rarely one tool — it's an emergent property of the tool set you wired together. A developer adds a web-search MCP server on Monday and a Stripe tool on Thursday, and now an agent that already reads the inbox is one config change away from danger. No one decided to build a lethal trifecta; it assembled itself across three pull requests.

ActPass: a 60-second exposure report — read-only, nothing blocked

Point ActPass at the agents (or the mcp.json) you already have. It classifies every tool into the three capability classes, colors them red (untrusted content) or blue(external action), and tells you which agents are exposed. It's pure static analysis — no proxy, no runtime, nothing in your request path, nothing breaks.

🔴 Red — untrusted content

web searchPDF from a prospectGitHub issueinbound email

one or the other — never both

🔵 Blue — critical actions

send emaildelete datachange permissionsmake a payment

Split untrusted reading and consequential actions into different agents — the injection has nowhere to land.

$ actpass exposure --agents agents.json

# ActPass Agent Exposure Report

**4** agents · **2** Lethal Trifecta · **1** red+blue violation

## support-bot — 🚨 LETHAL TRIFECTA  (🔴🔵 red+blue)
- untrusted content:  web_search
- sensitive data:     stripe_refund
- external comms:     stripe_refund, send_email

> Keep at most two legs (Rule of Two). Lowest-cost fix:
> gate the action tools behind human approval, or move
> untrusted-content reads into a separate quarantined agent.

## readonly-analyst — ✅ ok  (⚪ none)

Every finding comes with a Rule-of-Two remediation: the least-disruptive leg to neutralize, and how (human-in-the-loop approval, trusted-recipient allowlists, or splitting untrusted reads into a quarantined agent). You decide what to fix — ActPass just makes the invisible visible.

Wire it into CI so it never comes back

Save a baseline and let ActPass fail a pull request only when it introduces a newtrifecta — accepted risk doesn't nag you, regressions can't merge:

$ actpass exposure --agents agents.json \
    --baseline exposure-baseline.json
[actpass] FAIL — new exposure introduced.
  trifecta: [deploy-bot]

Start with visibility, graduate to enforcement

The exposure report is the front door because it has zero friction. When you're ready, the same deterministic engine enforces at runtime — human-in-the-loop approvals, allow/deny, and a tamper-evident evidence ledger — so “AI proposes, the deterministic engine decides.” The Agents & Exposure guide walks through both in minutes.

Want your report? Run actpass exposure against your agents, or send us your mcp.jsonand we'll hand back the map. No agent in the loop, nothing to break — just the truth about your blast radius.

Read untrusted content — a web page, an email, a PDF from a prospect, a GitHub issue.
Access sensitive data — secrets, customer records, a payments key, production config.
Communicate externally / change state — send an email, post, write to a repo, make a payment.

All three at once = an attacker's sentence can read your secrets and send them out.

You can't prompt-engineer your way out

The problem: nobody can see the trifecta

ActPass: a 60-second exposure report — read-only, nothing blocked

🔴 Red — untrusted content

web searchPDF from a prospectGitHub issueinbound email

one or the other — never both

🔵 Blue — critical actions

send emaildelete datachange permissionsmake a payment

Split untrusted reading and consequential actions into different agents — the injection has nowhere to land.

$ actpass exposure --agents agents.json

# ActPass Agent Exposure Report

**4** agents · **2** Lethal Trifecta · **1** red+blue violation

## support-bot — 🚨 LETHAL TRIFECTA  (🔴🔵 red+blue)
- untrusted content:  web_search
- sensitive data:     stripe_refund
- external comms:     stripe_refund, send_email

> Keep at most two legs (Rule of Two). Lowest-cost fix:
> gate the action tools behind human approval, or move
> untrusted-content reads into a separate quarantined agent.

## readonly-analyst — ✅ ok  (⚪ none)

Wire it into CI so it never comes back

Save a baseline and let ActPass fail a pull request only when it introduces a newtrifecta — accepted risk doesn't nag you, regressions can't merge:

$ actpass exposure --agents agents.json \
    --baseline exposure-baseline.json
[actpass] FAIL — new exposure introduced.
  trifecta: [deploy-bot]

The Lethal Trifecta is hiding in your AI agents — find it in 60 seconds

You can't prompt-engineer your way out

The problem: nobody can see the trifecta

ActPass: a 60-second exposure report — read-only, nothing blocked

Wire it into CI so it never comes back

Start with visibility, graduate to enforcement

See your agents' exposure

Keep reading

The Lethal Trifecta is hiding in your AI agents — find it in 60 seconds

You can't prompt-engineer your way out

The problem: nobody can see the trifecta

ActPass: a 60-second exposure report — read-only, nothing blocked

Wire it into CI so it never comes back

Start with visibility, graduate to enforcement

See your agents' exposure

Keep reading