Where's the raccoon with the ham radio? (ChatGPT Images 2.0)
Summary
OpenAI released ChatGPT Images 2.0 on April 21st, 2026, an image generation model touted by Sam Altman as a leap equivalent to GPT-3 to GPT-5. Initial testing with a "Where's Waldo style image but it's where is the raccoon holding a ham radio" prompt showed gpt-image-1 failed to generate the raccoon, while Google's Nano Banana 2 successfully placed it in an "Amateur Radio Club" booth. The new gpt-image-2 initially also failed to include the raccoon with default settings. However, when run with `outputQuality` set to `high` and dimensions `3840x2160`, gpt-image-2 successfully generated a complex image featuring the raccoon and ham radio, costing approximately 40 cents for 13,342 output tokens. The model demonstrates significant improvements in handling complex illustrations, accurate text rendering, and generating high-resolution, detailed images across various languages.
Key takeaway
For AI Product Managers evaluating image generation capabilities, ChatGPT Images 2.0's enhanced ability to render complex scenes, accurate text, and high-resolution outputs, especially in "thinking mode," makes it a strong contender. You should experiment with its `outputQuality` and `size` parameters to achieve desired detail and fidelity, and consider its multilingual text generation for diverse market applications. Be cautious about relying on models to self-verify image content.
Key insights
ChatGPT Images 2.0 significantly advances image generation, particularly in complex scene composition and accurate text rendering.
Principles
- Higher quality settings improve complex image generation accuracy.
- Models can struggle with specific object placement in "Where's Waldo" scenarios.
Method
Use `outputQuality: high` and maximum dimensions (e.g., `3840x2160`) with gpt-image-2 for complex, detailed image generation, especially when precise object inclusion is critical.
In practice
- Test image models with specific, detailed object prompts.
- Utilize `high` quality and large sizes for critical image outputs.
- Explore multilingual text generation for global content.
Topics
- ChatGPT Images 2.0
- Image Generation Models
- Multilingual Text Rendering
- High-Resolution Imaging
- AI Model Benchmarking
Code references
Best for: Computer Vision Engineer, AI Product Manager, Entrepreneur, AI Engineer, Machine Learning Engineer, Prompt Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Simon Willison's Weblog.