The Lethal Trifecta is three capability classes landing together in one agent session: untrusted input (web pages, user content, repo text), sensitive access (secrets, files, databases), and external comms (network calls, email, webhooks). Static analysis tells you which agents could form the trifecta based on their tool inventory. Runtime enforcement catches the call that does — and stops it.
How session coloring works
Every tool call is stamped with one or more capability classes the moment the agent makes it. ActPass accumulates these stamps across the live session. The first read_file on a sensitive path? That's sensitive_access. The earlier fetch to pull a README? untrusted_input. Neither alone is dangerous. But the moment a third call would complete the trifecta — say, a send_email — the enforcement layer already knows what the session has accumulated and blocks before execution.
What the hook sees
The ActPass PreToolUse hook receives every tool call before Claude or Codex executes it. It runs a deterministic policy check — no LLM in the decision path — and either passes the call through (exit 0) or blocks it with an explanation the agent can relay to you (exit 2). Session state lives in-process; the hook is a 60-second-cached preflight call to the ActPass API plus local file-based policy eval.
The coloring logic lives entirely in lib/actpass/coloring.ts: classesForToolCall() maps a tool name and arguments to capability classes, mergeClasses() accumulates the session set, and evaluateColoring() decides whether the next call would complete the trifecta or mix red/blue MCP colors. The function is pure — no I/O, fully unit-tested.
Monitor mode first, enforce mode when ready
If you're not ready to block, start with --rule-of-two monitor. The hook passes the call through but injects a warning into the agent's systemMessage: “This session has now combined untrusted input + sensitive access + external comms. You are operating in the Lethal Trifecta. Proceed with extreme caution.” The agent sees it, the user's transcript shows it, and it creates a natural pause for review before switching to enforce.
Switch to --rule-of-two enforce when you want hard stops. In a CI/CD pipeline or automated agent loop, monitor is noise — enforce is the only signal that matters.
MCP color enforcement
Anthropic's MCP color spec marks servers as red (untrusted/user-controlled) or blue (external comms). Mixing both in one session is a prompt injection waiting to happen: a red server feeds poisoned content, and the blue server ships it out. The --block-color-mix flag catches the call that would be the second color in a session and blocks before the data moves.
Zero-friction path to enforcement
The full stack is three steps: run actpass exposure to classify your tool inventory, review the report to confirm which agents hold dangerous combinations, then add the hook to your ~/.claude/settings.json with --rule-of-two enforce. The hook is local, deterministic, and sub-millisecond on cache hit. Your agents don't slow down; they just can't accidentally complete the trifecta. Full setup guide →
Source: Anthropic MCP specification (color annotations); Simon Willison, “The Lethal Trifecta” (2025); Invariant Labs, MCP coloring research.