Breaking Opus 4.7 with ChatGPT (Hacking Claude's Memory)

· Source: Embrace The Red · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy · Depth: Advanced, medium

Summary

A recent demonstration successfully exploited "Claude Opus 4.7" using an adversarial image generated by "ChatGPT", leading to indirect prompt injection and the persistence of false memories. The attack involved creating a puzzle image with hidden text that, when analyzed by Opus, triggered its "memory_user_edits" tool. This caused Claude to store fabricated user details, such as "User's name is Neo" and "User is 43 years old (as of April 2026)". Despite "Opus 4.6+" models being more resilient, this specific attack achieved a 5/10 success rate in repeated trials, even though Claude often detected suspicious activity. The author reported the vulnerability to Anthropic in March 2026, and the specific adversarial example ceased to function within 24 hours of publication, indicating rapid mitigation. This highlights the unique adversarial environment of AI agents compared to other technologies.

Key takeaway

For AI Security Engineers developing or deploying advanced LLMs with memory and tool-use capabilities, you must prioritize robust indirect prompt injection defenses. Your systems, even resilient ones like "Opus 4.7", can be hijacked by adversarial images to store false information. Continuously red-team your models, especially their tool invocation mechanisms, and monitor for rapid behavioral shifts to counter evolving adversarial tactics.

Key insights

"Claude Opus 4.7" was vulnerable to indirect prompt injection via adversarial images, leading to memory corruption.

Principles

Method

Generate an adversarial image with hidden text and tool-steering hints using "ChatGPT". Feed it to the target LLM to trigger tool invocation and memory modification.

In practice

Topics

Best for: AI Security Engineer, Machine Learning Engineer, AI Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Embrace The Red.