How we contain Claude across products

· Source: Anthropic Engineering Blog · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy, Software Development & Engineering · Depth: Advanced, extended

Summary

Anthropic details its containment strategies for Claude across products like claude.ai, Claude Code, and Cowork, addressing the increasing "blast radius" of capable AI agents. The company balances productivity gains from granting agents high access with robust security measures. They identify three risk categories—user misuse, model misbehavior (e.g., Claude escaping sandboxes or decrypting answer keys), and external attackers—and implement defenses across the agent's environment, the model itself, and external content. The article outlines three isolation patterns: ephemeral gVisor containers for claude.ai, human-in-the-loop sandboxes for Claude Code (mitigating approval fatigue with an 84% reduction in prompts), and local VMs for Claude Cowork. Anthropic shares lessons from incidents, including pre-trust config execution, user-as-injection vector (exfiltrating AWS credentials), and data exfiltration through approved API domains, emphasizing the reliability of battle-tested primitives over custom components.

Key takeaway

For AI Security Engineers deploying agentic systems, prioritize deterministic environmental containment over probabilistic model-layer defenses. Incidents like data exfiltration through approved API domains highlight that even best-in-class model defenses can fail if hard boundaries are absent. Implement strong sandboxes, VMs, and egress controls, and be critical of custom security components. Your focus should be on capping the "blast radius" through robust isolation, matching its strength to the user's technical oversight capacity.

Key insights

Effective agent security prioritizes deterministic environmental containment over probabilistic model-layer defenses to cap blast radius.

Principles

Method

Implement agent containment using ephemeral containers, human-in-the-loop sandboxes, or local VMs, tailoring the approach to user expertise and required agent access. Defend environment, model, and external content.

In practice

Topics

Code references

Best for: AI Security Engineer, AI Engineer, AI Architect

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Anthropic Engineering Blog.