Google's upgrade breaks reasoning barriers

2026-02-13 · Source: The Rundown AI · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Emerging Technologies & Innovation · Depth: Intermediate, medium

Summary

Google has significantly upgraded its Gemini 3 Deep Think reasoning mode, achieving dominant scores across various benchmarks in math, coding, and science. The updated model scored 84.6% on ARC-AGI-2, surpassing Opus 4.6 (68.8%) and GPT-5.2 (52.9%), and reached 48.4% on Humanity's Last Exam. Deep Think also earned gold-medal marks on the 2025 Physics & Chemistry Olympiads and an Elo score of 3,455 on Codeforces. Concurrently, Google unveiled Aletheia, a math agent powered by Deep Think, capable of autonomously solving open problems and verifying proofs. This upgrade is available to Google AI Ultra subscribers in the Gemini app, with API access for researchers via an early access program. Separately, OpenAI released GPT-5.3-Codex-Spark, a speed-optimized coding model running on Cerebras hardware, achieving over 1,000 tokens per second, and MiniMax launched its open-source M2.5 model, which rivals frontier coding models at a significantly lower cost.

Key takeaway

For AI Architects and Machine Learning Engineers evaluating frontier models, Google's Deep Think and Aletheia demonstrate significant advancements in reasoning and autonomous problem-solving, suggesting a shift in capabilities for scientific AI. Your teams should explore the Deep Think early access program for research applications and consider MiniMax's M2.5 for cost-effective, high-performance coding agents to optimize operational expenses.

Key insights

Google's Deep Think and Aletheia agents are pushing AI reasoning and scientific discovery into uncharted territory.

Principles

Speed can be prioritized over raw intelligence for specific AI tasks.
Cost-effective frontier models can democratize advanced AI capabilities.

Method

To generate a TV commercial with AI, plan scenes and prompts with Gemini, then use Higgsfield's Image and Video tools (Nano Banana Pro, Kling 3.0) to create frames and stitch videos.

In practice

Utilize Deep Think for advanced math and science research tasks.
Deploy GPT-5.3-Codex-Spark for real-time coding edits.
Consider MiniMax M2.5 for cost-efficient, high-performance coding agents.

Topics

Google Deep Think
OpenAI Codex-Spark
AI Reasoning Benchmarks
Open-source LLMs
AI Agentic Systems

Best for: AI Architect, Machine Learning Engineer, AI Scientist, AI Engineer, Data Scientist, General Interest

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by The Rundown AI.