Domain-Conditioned Safety in Frontier Computer-Using Agents: A 793-Episode Browser Benchmark, a Coding-Domain Cross-Reference, and a Reproducibility Audit of Recent Red-Teaming

2026-06-03 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy, Robotics & Autonomous Systems · Depth: Expert, quick

Summary

A new study investigates the effectiveness of prompt-injection attacks against current frontier computer-using agents (CUAs), challenging previous reports of 42-98% success rates often based on retired or most-vulnerable models. Researchers released CUA-HandCrafted, a public benchmark comprising 793 episodes across 24 multi-step web tasks, 56 attack templates, 8 attack families, and 4 system-prompt configurations. Against Claude Sonnet 4.6 and GPT-5.4, the benchmark measured 0/140 multi-step attack success, with a Clopper-Pearson 95% upper bound of 2.60%. A prompt ablation confirmed this resistance stems from model weights. However, this safety does not generalize; the same models achieved up to 100% success on a sister coding-agent benchmark, SkillBench, using hand-crafted skill-injection. The findings suggest that high reported attack success rates are largely due to RL-optimized injection text, not attack categories, and that frontier safety hardening is domain-conditioned, specific to the heavily-targeted browser surface.

Key takeaway

For AI Security Engineers evaluating frontier computer-using agents, recognize that current browser-based prompt injection resistance in models like Claude Sonnet 4.6 and GPT-5.4 is domain-specific. Do not assume this hardening extends to other modalities, such as coding agents, where the same models remain highly vulnerable to skill-injection. You should implement and test domain-specific red-teaming strategies for each CUA application to ensure comprehensive safety, rather than relying on generalized browser-centric benchmarks.

Key insights

Frontier computer-using agents exhibit domain-conditioned safety, resisting browser-based prompt injection but remaining vulnerable in other modalities like coding.

Principles

Frontier CUA safety is domain-conditioned.
High ASRs often stem from RL-optimized injection text.
Browser-domain safety does not generalize to other CUA modalities.

Method

The CUA-HandCrafted benchmark uses 793 episodes, 24 web tasks, 56 attack templates, 8 attack families, and 4 system-prompt configurations to test prompt injection against frontier CUAs.

In practice

Test CUA safety across diverse domains.
Avoid extrapolating browser safety to other modalities.
Release optimized attack strings for reproducibility.

Topics

Computer-Using Agents
Prompt Injection
Red Teaming
Domain-Conditioned Safety
Browser Automation
SkillBench

Best for: Research Scientist, CTO, VP of Engineering/Data, AI Scientist, AI Security Engineer, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.