Where's the raccoon with the ham radio? (ChatGPT Images 2.0)

2026-04-21 · Source: Simon Willison's Weblog · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Intermediate, extended

Summary

OpenAI released ChatGPT Images 2.0 on April 21st, 2026, an image generation model touted by Sam Altman as a leap equivalent to GPT-3 to GPT-5. Initial testing with a "Where's Waldo style image but it's where is the raccoon holding a ham radio" prompt showed gpt-image-1 failed to generate the raccoon, while Google's Nano Banana 2 successfully placed it in an "Amateur Radio Club" booth. The new gpt-image-2 initially also failed to include the raccoon with default settings. However, when run with `outputQuality` set to `high` and dimensions `3840x2160`, gpt-image-2 successfully generated a complex image featuring the raccoon and ham radio, costing approximately 40 cents for 13,342 output tokens. The model demonstrates significant improvements in handling complex illustrations, accurate text rendering, and generating high-resolution, detailed images across various languages.

Key takeaway

For AI Product Managers evaluating image generation capabilities, ChatGPT Images 2.0's enhanced ability to render complex scenes, accurate text, and high-resolution outputs, especially in "thinking mode," makes it a strong contender. You should experiment with its `outputQuality` and `size` parameters to achieve desired detail and fidelity, and consider its multilingual text generation for diverse market applications. Be cautious about relying on models to self-verify image content.

Key insights

ChatGPT Images 2.0 significantly advances image generation, particularly in complex scene composition and accurate text rendering.

Principles

Higher quality settings improve complex image generation accuracy.
Models can struggle with specific object placement in "Where's Waldo" scenarios.

Method

Use `outputQuality: high` and maximum dimensions (e.g., `3840x2160`) with gpt-image-2 for complex, detailed image generation, especially when precise object inclusion is critical.

In practice

Test image models with specific, detailed object prompts.
Utilize `high` quality and large sizes for critical image outputs.
Explore multilingual text generation for global content.

Topics

ChatGPT Images 2.0
Image Generation Models
Multilingual Text Rendering
High-Resolution Imaging
AI Model Benchmarking

Code references

Best for: Computer Vision Engineer, AI Product Manager, Entrepreneur, AI Engineer, Machine Learning Engineer, Prompt Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Simon Willison's Weblog.