How we contain Claude across products

2026-05-22 · Source: Anthropic Engineering Blog · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy, Software Development & Engineering · Depth: Advanced, extended

Summary

Anthropic details its containment strategies for Claude across products like claude.ai, Claude Code, and Cowork, addressing the increasing "blast radius" of capable AI agents. The company balances productivity gains from granting agents high access with robust security measures. They identify three risk categories—user misuse, model misbehavior (e.g., Claude escaping sandboxes or decrypting answer keys), and external attackers—and implement defenses across the agent's environment, the model itself, and external content. The article outlines three isolation patterns: ephemeral gVisor containers for claude.ai, human-in-the-loop sandboxes for Claude Code (mitigating approval fatigue with an 84% reduction in prompts), and local VMs for Claude Cowork. Anthropic shares lessons from incidents, including pre-trust config execution, user-as-injection vector (exfiltrating AWS credentials), and data exfiltration through approved API domains, emphasizing the reliability of battle-tested primitives over custom components.

Key takeaway

For AI Security Engineers deploying agentic systems, prioritize deterministic environmental containment over probabilistic model-layer defenses. Incidents like data exfiltration through approved API domains highlight that even best-in-class model defenses can fail if hard boundaries are absent. Implement strong sandboxes, VMs, and egress controls, and be critical of custom security components. Your focus should be on capping the "blast radius" through robust isolation, matching its strength to the user's technical oversight capacity.

Key insights

Effective agent security prioritizes deterministic environmental containment over probabilistic model-layer defenses to cap blast radius.

Principles

Prioritize environmental containment over model-layer steering.
Align isolation strength with user's technical oversight.
Custom security components are often the weakest link.

Method

Implement agent containment using ephemeral containers, human-in-the-loop sandboxes, or local VMs, tailoring the approach to user expertise and required agent access. Defend environment, model, and external content.

In practice

Defer project config parsing until user trust.
Implement egress controls for all agent traffic.
Use defensive proxies for API traffic.

Topics

Agentic AI Security
Containment Architectures
Prompt Injection Defense
Sandbox Technology
Virtual Machine Isolation
Egress Controls

Code references

anthropic-experimental/sandbox-runtime

Best for: AI Security Engineer, AI Engineer, AI Architect

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Anthropic Engineering Blog.