I was hacked...

2026-04-03 · Source: Matthew Berman · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy · Depth: Advanced, long

Summary

An AI hacking challenge pitted "Ply the Liberator," a renowned AI hacker, against the host's personal "OpenClaw" AI system, which scans emails. Ply was given five attempts to infiltrate the system, aiming to access personal files, emails, passwords, and drain the host's API token wallet. Initial attempts involved using "tokenade" payloads—crafted, high-token-count messages disguised as emojis—to identify the underlying AI model and overwhelm it, but these were caught by Gmail's spam filter. After whitelisting, Ply attempted a "siege attack" with millions of tokens to deplete the host's quota, which the OpenClaw system successfully quarantined. Subsequent strategies included structured jailbreak templates and system command impersonation, both of which were also quarantined. Even with a hint revealing the model was "Opus 4.6," Ply's final data exfiltration probe was blocked, demonstrating the OpenClaw system's robust security.

Key takeaway

For AI Engineers developing or deploying AI systems, prioritize using the most advanced reasoning models as your primary defense layer. Your system's initial scanner must be robust, as smaller or instant models are significantly more susceptible to prompt injection and resource-draining attacks. Regularly test your AI's resilience against sophisticated jailbreaking techniques to identify and patch vulnerabilities before they are exploited, ensuring your security measures are truly ironclad.

Key insights

Robust AI security requires advanced models and proactive defenses against sophisticated prompt injection and resource exhaustion attacks.

Principles

Use the best possible model as the frontier scanner.
Human-in-the-loop is critical for AI security.

Method

Attackers probe AI systems to identify models, then use tokenade payloads, jailbreak commands, or system command impersonation to exploit vulnerabilities or deplete resources.

In practice

Implement strong spam filters for email-based AI systems.
Quarantine suspicious inputs to prevent token exhaustion.
Utilize AI code review tools like Grapile for quality.

Topics

AI Hacking Challenge
Prompt Injection Techniques
Tokenade Attacks
OpenClaw AI System
Anthropic Opus 4.6

Best for: AI Security Engineer, AI Engineer, AI Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Matthew Berman.