Qwen3.6-35B-A3B on my laptop drew me a better pelican than Claude Opus 4.7
Summary
On April 16, 2026, a comparison of image generation capabilities between Alibaba's Qwen3.6-35B-A3B and Anthropic's Claude Opus 4.7 revealed unexpected results. Using a 20.9GB quantized version of Qwen3.6-35B-A3B-UD-Q4_K_S.gguf running locally on a MacBook Pro M5 via LM Studio, the Qwen model produced a superior image of a "pelican riding a bicycle" compared to Claude Opus 4.7, which struggled with the bicycle's frame. A subsequent test involving a "flamingo riding a unicycle" also favored Qwen, which generated a more charismatic and detailed SVG. Despite the "pelican benchmark" being a humorous, informal test, its historical correlation with general model utility appears to be breaking, as a smaller, locally run model outperformed a major proprietary release.
Key takeaway
For AI engineers evaluating image generation models, do not solely rely on general utility benchmarks. Your teams should consider testing smaller, quantized models like Qwen3.6-35B-A3B for specific creative outputs, especially when local inference is a priority. This approach might yield superior results for niche tasks compared to larger, proprietary models, challenging assumptions about model hierarchy.
Key insights
Local, quantized models can surprisingly outperform larger proprietary models in specific creative tasks.
Principles
- Model utility does not always correlate with benchmark performance.
- Quantized models offer significant local inference capabilities.
In practice
- Test local LLMs for niche creative tasks.
- Explore quantized models for specific image generation needs.
Topics
- Qwen3.6-35B-A3B
- Claude Opus 4.7
- Image Generation
- Model Benchmarking
- Quantized Models
Code references
Best for: AI Engineer, Computer Vision Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, Prompt Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Simon Willison's Weblog.