I Ran the 3B Model That Beat Gemini 3 Pro at Olympiad Math — It Shouldn't Work
Summary
The VibeThinker-3B, a 3-billion-parameter model, recently achieved a score of 94.3 on AIME 2026, surpassing Gemini 3 Pro's 91.7. This MIT-licensed model, whose 1.5B predecessor cost only \$7,800 to post-train, demonstrates first-tier performance in verifiable reasoning tasks like competition math and competitive coding. Crucially, its compact size allows it to run on a laptop, distinguishing it from larger general knowledge models. Its release on June 15, 2026, sparked immediate skepticism within the AI community, which has grown wary of "benchmark theater" and questions the real-world applicability of such specific benchmarks, leading to accusations of "benchmaxxing."
Key takeaway
For machine learning engineers evaluating models for specific, verifiable reasoning tasks like competitive programming or advanced mathematics, you should consider exploring smaller, specialized models. VibeThinker-3B demonstrates that top-tier performance in these domains is achievable with 3-billion-parameter models, offering significant advantages in deployment cost and local execution over larger, general-purpose alternatives. This shifts the focus from raw parameter count to task-specific optimization.
Key insights
A small, cost-effective 3B model achieved top-tier verifiable reasoning performance, challenging assumptions about model size and capability.
Principles
- Specific reasoning benchmarks can reveal capabilities not evident in general knowledge tests.
- Smaller models can achieve competitive performance on specialized, verifiable tasks.
- Cost-effective post-training can yield high-performing, deployable models.
In practice
- Run 3B models locally for competitive math or coding tasks.
- Evaluate specialized models against specific, verifiable reasoning benchmarks.
Topics
- VibeThinker-3B
- Small Language Models
- Mathematical Reasoning
- Competitive Programming
- Benchmark Evaluation
- Model Efficiency
Best for: AI Engineer, Research Scientist, Entrepreneur, AI Scientist, Machine Learning Engineer, AI Student
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Towards AI - Medium.