Google's upgrade breaks reasoning barriers
Summary
Google has significantly upgraded its Gemini 3 Deep Think reasoning mode, achieving dominant scores across various benchmarks in math, coding, and science. The updated model scored 84.6% on ARC-AGI-2, surpassing Opus 4.6 (68.8%) and GPT-5.2 (52.9%), and reached 48.4% on Humanity's Last Exam. Deep Think also earned gold-medal marks on the 2025 Physics & Chemistry Olympiads and an Elo score of 3,455 on Codeforces. Concurrently, Google unveiled Aletheia, a math agent powered by Deep Think, capable of autonomously solving open problems and verifying proofs. This upgrade is available to Google AI Ultra subscribers in the Gemini app, with API access for researchers via an early access program. Separately, OpenAI released GPT-5.3-Codex-Spark, a speed-optimized coding model running on Cerebras hardware, achieving over 1,000 tokens per second, and MiniMax launched its open-source M2.5 model, which rivals frontier coding models at a significantly lower cost.
Key takeaway
For AI Architects and Machine Learning Engineers evaluating frontier models, Google's Deep Think and Aletheia demonstrate significant advancements in reasoning and autonomous problem-solving, suggesting a shift in capabilities for scientific AI. Your teams should explore the Deep Think early access program for research applications and consider MiniMax's M2.5 for cost-effective, high-performance coding agents to optimize operational expenses.
Key insights
Google's Deep Think and Aletheia agents are pushing AI reasoning and scientific discovery into uncharted territory.
Principles
- Speed can be prioritized over raw intelligence for specific AI tasks.
- Cost-effective frontier models can democratize advanced AI capabilities.
Method
To generate a TV commercial with AI, plan scenes and prompts with Gemini, then use Higgsfield's Image and Video tools (Nano Banana Pro, Kling 3.0) to create frames and stitch videos.
In practice
- Utilize Deep Think for advanced math and science research tasks.
- Deploy GPT-5.3-Codex-Spark for real-time coding edits.
- Consider MiniMax M2.5 for cost-efficient, high-performance coding agents.
Topics
- Google Deep Think
- OpenAI Codex-Spark
- AI Reasoning Benchmarks
- Open-source LLMs
- AI Agentic Systems
Best for: AI Architect, Machine Learning Engineer, AI Scientist, AI Engineer, Data Scientist, General Interest
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by The Rundown AI.