Computer-Use and TOCTOU: What You Click Is Not What You Get!

· Source: Embrace The Red · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy, Robotics & Autonomous Systems · Depth: Intermediate, short

Summary

The article discusses a Time-of-Check to Time-of-Use (TOCTOU) vulnerability in AI Computer-Use Agents, specifically demonstrated with Claude. This vulnerability arises because the screen can change during the LLM's inference process, leading the agent to act on an unintended object. The author reproduced a similar attack to one disclosed by Jun Kokatsu for ChatGPT Operator. A basic experiment showed Claude Computer-Use agents could be tricked by a button changing from "OKAY" to "ENTER THE MATRIX" after 2 seconds. A more malicious attack was crafted to exploit this by having an agent click a "Continue" button on a phishing page, which then becomes the "Send" button in a pre-drafted Outlook email, sending an arbitrary message. A prompt injection technique, asking Claude to calculate "1+1" using bash, was used to introduce a 4-5 second delay, allowing the Outlook page to load fully before the click. The vulnerability was reported to Anthropic in October last year, who acknowledged it and later addressed it in their Cowork with Computer-Use feature by ensuring pixels haven't changed before action. Other vendors were also found vulnerable and reported.

Key takeaway

For AI Security Engineers evaluating agent-based systems, you must account for Time-of-Check to Time-of-Use (TOCTOU) vulnerabilities. Your agents could perform unintended actions if the UI changes during the LLM's reasoning process. Implement robust UI state verification mechanisms, like re-checking pixel integrity, immediately before any agent-initiated action. This prevents malicious actors from exploiting timing windows to trigger unintended clicks or data exfiltration.

Key insights

AI Computer-Use agents are vulnerable to TOCTOU attacks, where screen changes during inference lead to unintended actions.

Principles

Method

An attacker crafts a phishing page with a button at the same coordinates as a target action (e.g., "Send" in Outlook). A timing delay (e.g., via prompt injection) ensures the target UI loads before the agent clicks.

In practice

Topics

Code references

Best for: AI Security Engineer, AI Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Embrace The Red.