What happened after 2,000 people tried to hack my AI assistant

2026-06-26 · Source: Simon Willison's Weblog · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy · Depth: Intermediate, quick

Summary

Fernando Irarrázaval conducted a challenge on hackmyclaw.com, inviting 2,000 participants to attempt leaking secrets from his OpenClaw AI assistant via email. Despite 6,000 attempts, \$500 in token spend, and a Google account suspension due to email volume, no participant successfully extracted the secret. The AI assistant, powered by Opus 4.6, utilized specific anti-prompt-injection rules, including directives to never reveal credentials, modify files, execute commands, or exfiltrate data based on email content. This outcome suggests that the significant efforts by AI labs to train frontier models against injection attacks, as noted in the GPT-5.6 system card, are proving effective in enhancing their resilience.

Key takeaway

For AI Security Engineers evaluating LLM deployment risks, this challenge highlights improved prompt injection resistance in frontier models like Opus 4.6. However, you should not interpret 6,000 failed attempts as a guarantee against more sophisticated future attacks. Always implement robust defense-in-depth strategies and avoid deploying production systems where a successful injection could cause irreversible damage, regardless of initial testing results.

Key insights

Frontier large language models demonstrate increased resistance to prompt injection attacks due to dedicated safety training.

Principles

Dedicated anti-injection training enhances LLM security.
Absence of successful attacks does not guarantee future immunity.

Method

The OpenClaw instance used explicit anti-prompt-injection rules within its prompt, forbidding revelation of secrets, file modification, command execution, or data exfiltration.

In practice

Test LLM security with red-teaming challenges.
Implement explicit anti-injection rules in system prompts.

Topics

Prompt Injection
AI Security
Large Language Models
Red Teaming
Opus 4.6
OpenClaw

Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Security Engineer, AI Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Simon Willison's Weblog.