What happened after 2,000 people tried to hack my AI assistant
Summary
Fernando Irarrázaval conducted a challenge on hackmyclaw.com, inviting 2,000 participants to attempt leaking secrets from his OpenClaw AI assistant via email. Despite 6,000 attempts, \$500 in token spend, and a Google account suspension due to email volume, no participant successfully extracted the secret. The AI assistant, powered by Opus 4.6, utilized specific anti-prompt-injection rules, including directives to never reveal credentials, modify files, execute commands, or exfiltrate data based on email content. This outcome suggests that the significant efforts by AI labs to train frontier models against injection attacks, as noted in the GPT-5.6 system card, are proving effective in enhancing their resilience.
Key takeaway
For AI Security Engineers evaluating LLM deployment risks, this challenge highlights improved prompt injection resistance in frontier models like Opus 4.6. However, you should not interpret 6,000 failed attempts as a guarantee against more sophisticated future attacks. Always implement robust defense-in-depth strategies and avoid deploying production systems where a successful injection could cause irreversible damage, regardless of initial testing results.
Key insights
Frontier large language models demonstrate increased resistance to prompt injection attacks due to dedicated safety training.
Principles
- Dedicated anti-injection training enhances LLM security.
- Absence of successful attacks does not guarantee future immunity.
Method
The OpenClaw instance used explicit anti-prompt-injection rules within its prompt, forbidding revelation of secrets, file modification, command execution, or data exfiltration.
In practice
- Test LLM security with red-teaming challenges.
- Implement explicit anti-injection rules in system prompts.
Topics
- Prompt Injection
- AI Security
- Large Language Models
- Red Teaming
- Opus 4.6
- OpenClaw
Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Security Engineer, AI Engineer, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Simon Willison's Weblog.