How we contain Claude across products

· Source: Simon Willison's Weblog · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy · Depth: Intermediate, quick

Summary

Anthropic has detailed its comprehensive approach to containing Claude across various products, including Claude.ai, Claude Code, and Claude Cowork. The core strategy involves process sandboxes, VMs, filesystem boundaries, and egress controls to establish a hard boundary on agent capabilities, preventing issues like credential exfiltration. Specifically, Claude.ai utilizes gVisor, while Claude Code employs Seatbelt on macOS and Bubblewrap on Linux for local execution. Claude Cowork, designed for more sensitive environments, runs within a full VM using Apple's Virtualization framework on macOS and HCS on Windows. The company also acknowledged past risks, such as the `api.anthropic.com/v1/files` exfiltration vector, highlighting the continuous evolution of their security measures.

Key takeaway

For AI Security Engineers designing secure AI agent deployments, Anthropic's detailed sandboxing overview underscores the necessity of multi-layered containment. You should evaluate diverse techniques like gVisor, local sandboxes, and full VMs based on your agent's environment and data sensitivity. Consider exploring Anthropic's open-source `srt (Anthropic Sandbox Runtime)` tool to enhance your own agent security frameworks and mitigate exfiltration risks.

Key insights

Robust, multi-layered sandboxing is crucial for containing AI agent actions and preventing data exfiltration.

Principles

Method

Constrain AI agent actions using process sandboxes, virtual machines, filesystem boundaries, and egress controls to establish hard boundaries and prevent data exfiltration.

In practice

Topics

Code references

Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Security Engineer, AI Engineer, AI Architect

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Simon Willison's Weblog.