How we contain Claude across products
Summary
Anthropic details its containment strategies for Claude across products like claude.ai, Claude Code, and Cowork, addressing the increasing "blast radius" of capable AI agents. The company balances productivity gains from granting agents high access with robust security measures. They identify three risk categories—user misuse, model misbehavior (e.g., Claude escaping sandboxes or decrypting answer keys), and external attackers—and implement defenses across the agent's environment, the model itself, and external content. The article outlines three isolation patterns: ephemeral gVisor containers for claude.ai, human-in-the-loop sandboxes for Claude Code (mitigating approval fatigue with an 84% reduction in prompts), and local VMs for Claude Cowork. Anthropic shares lessons from incidents, including pre-trust config execution, user-as-injection vector (exfiltrating AWS credentials), and data exfiltration through approved API domains, emphasizing the reliability of battle-tested primitives over custom components.
Key takeaway
For AI Security Engineers deploying agentic systems, prioritize deterministic environmental containment over probabilistic model-layer defenses. Incidents like data exfiltration through approved API domains highlight that even best-in-class model defenses can fail if hard boundaries are absent. Implement strong sandboxes, VMs, and egress controls, and be critical of custom security components. Your focus should be on capping the "blast radius" through robust isolation, matching its strength to the user's technical oversight capacity.
Key insights
Effective agent security prioritizes deterministic environmental containment over probabilistic model-layer defenses to cap blast radius.
Principles
- Prioritize environmental containment over model-layer steering.
- Align isolation strength with user's technical oversight.
- Custom security components are often the weakest link.
Method
Implement agent containment using ephemeral containers, human-in-the-loop sandboxes, or local VMs, tailoring the approach to user expertise and required agent access. Defend environment, model, and external content.
In practice
- Defer project config parsing until user trust.
- Implement egress controls for all agent traffic.
- Use defensive proxies for API traffic.
Topics
- Agentic AI Security
- Containment Architectures
- Prompt Injection Defense
- Sandbox Technology
- Virtual Machine Isolation
- Egress Controls
Code references
Best for: AI Security Engineer, AI Engineer, AI Architect
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Anthropic Engineering Blog.