Domain-Conditioned Safety in Frontier Computer-Using Agents: A 793-Episode Browser Benchmark, a Coding-Domain Cross-Reference, and a Reproducibility Audit of Recent Red-Teaming
Summary
A new study investigates the effectiveness of prompt-injection attacks against current frontier computer-using agents (CUAs), challenging previous reports of 42-98% success rates often based on retired or most-vulnerable models. Researchers released CUA-HandCrafted, a public benchmark comprising 793 episodes across 24 multi-step web tasks, 56 attack templates, 8 attack families, and 4 system-prompt configurations. Against Claude Sonnet 4.6 and GPT-5.4, the benchmark measured 0/140 multi-step attack success, with a Clopper-Pearson 95% upper bound of 2.60%. A prompt ablation confirmed this resistance stems from model weights. However, this safety does not generalize; the same models achieved up to 100% success on a sister coding-agent benchmark, SkillBench, using hand-crafted skill-injection. The findings suggest that high reported attack success rates are largely due to RL-optimized injection text, not attack categories, and that frontier safety hardening is domain-conditioned, specific to the heavily-targeted browser surface.
Key takeaway
For AI Security Engineers evaluating frontier computer-using agents, recognize that current browser-based prompt injection resistance in models like Claude Sonnet 4.6 and GPT-5.4 is domain-specific. Do not assume this hardening extends to other modalities, such as coding agents, where the same models remain highly vulnerable to skill-injection. You should implement and test domain-specific red-teaming strategies for each CUA application to ensure comprehensive safety, rather than relying on generalized browser-centric benchmarks.
Key insights
Frontier computer-using agents exhibit domain-conditioned safety, resisting browser-based prompt injection but remaining vulnerable in other modalities like coding.
Principles
- Frontier CUA safety is domain-conditioned.
- High ASRs often stem from RL-optimized injection text.
- Browser-domain safety does not generalize to other CUA modalities.
Method
The CUA-HandCrafted benchmark uses 793 episodes, 24 web tasks, 56 attack templates, 8 attack families, and 4 system-prompt configurations to test prompt injection against frontier CUAs.
In practice
- Test CUA safety across diverse domains.
- Avoid extrapolating browser safety to other modalities.
- Release optimized attack strings for reproducibility.
Topics
- Computer-Using Agents
- Prompt Injection
- Red Teaming
- Domain-Conditioned Safety
- Browser Automation
- SkillBench
Best for: Research Scientist, CTO, VP of Engineering/Data, AI Scientist, AI Security Engineer, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.