Domain-Conditioned Safety in Frontier Computer-Using Agents: A 793-Episode Browser Benchmark, a Coding-Domain Cross-Reference, and a Reproducibility Audit of Recent Red-Teaming

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy, Robotics & Autonomous Systems · Depth: Expert, quick

Summary

A new study investigates the effectiveness of prompt-injection attacks against current frontier computer-using agents (CUAs), challenging previous reports of 42-98% success rates often based on retired or most-vulnerable models. Researchers released CUA-HandCrafted, a public benchmark comprising 793 episodes across 24 multi-step web tasks, 56 attack templates, 8 attack families, and 4 system-prompt configurations. Against Claude Sonnet 4.6 and GPT-5.4, the benchmark measured 0/140 multi-step attack success, with a Clopper-Pearson 95% upper bound of 2.60%. A prompt ablation confirmed this resistance stems from model weights. However, this safety does not generalize; the same models achieved up to 100% success on a sister coding-agent benchmark, SkillBench, using hand-crafted skill-injection. The findings suggest that high reported attack success rates are largely due to RL-optimized injection text, not attack categories, and that frontier safety hardening is domain-conditioned, specific to the heavily-targeted browser surface.

Key takeaway

For AI Security Engineers evaluating frontier computer-using agents, recognize that current browser-based prompt injection resistance in models like Claude Sonnet 4.6 and GPT-5.4 is domain-specific. Do not assume this hardening extends to other modalities, such as coding agents, where the same models remain highly vulnerable to skill-injection. You should implement and test domain-specific red-teaming strategies for each CUA application to ensure comprehensive safety, rather than relying on generalized browser-centric benchmarks.

Key insights

Frontier computer-using agents exhibit domain-conditioned safety, resisting browser-based prompt injection but remaining vulnerable in other modalities like coding.

Principles

Method

The CUA-HandCrafted benchmark uses 793 episodes, 24 web tasks, 56 attack templates, 8 attack families, and 4 system-prompt configurations to test prompt injection against frontier CUAs.

In practice

Topics

Best for: Research Scientist, CTO, VP of Engineering/Data, AI Scientist, AI Security Engineer, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.