Every team shipping AI agents is quietly accumulating the same risk, and most don't have a name for it yet. Security researcher Simon Willison does: the Lethal Trifecta. An agent becomes genuinely dangerous the moment it can do all three of these at once:
- Read untrusted content — a web page, an email, a PDF from a prospect, a GitHub issue.
- Access sensitive data — secrets, customer records, a payments key, production config.
- Communicate externally / change state — send an email, post, write to a repo, make a payment.
With all three, a single attacker-controlled sentence buried in that web page can instruct the agent to read your secrets and exfiltrate them — and the model will often comply. This isn't hypothetical: it's how the Chevrolet chatbot sold a car for $1, how Brave demonstrated Comet's assistant leaking a Gmail code, and what The Economist meant by “why AI systems may never be secure.”
You can't prompt-engineer your way out
The instinct is to add a smarter filter — a guard model, spotlighting, a classifier. The research is brutal on this: in “The Attacker Moves Second,” teams from OpenAI, Anthropic, and Google DeepMind showed adaptive attacks bypassing essentially every AI-based defense at 80–100%. You cannot secure AI with more AI — the attacker always gets the last move.
The problem: nobody can see the trifecta
The trifecta is rarely one tool — it's an emergent property of the tool set you wired together. A developer adds a web-search MCP server on Monday and a Stripe tool on Thursday, and now an agent that already reads the inbox is one config change away from danger. No one decided to build a lethal trifecta; it assembled itself across three pull requests.
ActPass: a 60-second exposure report — read-only, nothing blocked
Point ActPass at the agents (or the mcp.json) you already have. It classifies every tool into the three capability classes, colors them red (untrusted content) or blue(external action), and tells you which agents are exposed. It's pure static analysis — no proxy, no runtime, nothing in your request path, nothing breaks.
$ actpass exposure --agents agents.json
# ActPass Agent Exposure Report
**4** agents · **2** Lethal Trifecta · **1** red+blue violation
## support-bot — 🚨 LETHAL TRIFECTA (🔴🔵 red+blue)
- untrusted content: web_search
- sensitive data: stripe_refund
- external comms: stripe_refund, send_email
> Keep at most two legs (Rule of Two). Lowest-cost fix:
> gate the action tools behind human approval, or move
> untrusted-content reads into a separate quarantined agent.
## readonly-analyst — ✅ ok (⚪ none)Every finding comes with a Rule-of-Two remediation: the least-disruptive leg to neutralize, and how (human-in-the-loop approval, trusted-recipient allowlists, or splitting untrusted reads into a quarantined agent). You decide what to fix — ActPass just makes the invisible visible.
Wire it into CI so it never comes back
Save a baseline and let ActPass fail a pull request only when it introduces a newtrifecta — accepted risk doesn't nag you, regressions can't merge:
$ actpass exposure --agents agents.json \
--baseline exposure-baseline.json
[actpass] FAIL — new exposure introduced.
trifecta: [deploy-bot]Start with visibility, graduate to enforcement
The exposure report is the front door because it has zero friction. When you're ready, the same deterministic engine enforces at runtime — human-in-the-loop approvals, allow/deny, and a tamper-evident evidence ledger — so “AI proposes, the deterministic engine decides.” The Agents & Exposure guide walks through both in minutes.
actpass exposure against your agents, or send us your mcp.jsonand we'll hand back the map. No agent in the loop, nothing to break — just the truth about your blast radius.