new Gemini 3 Deep Think, Anthropic $30B @ $380B, GPT-5.3-Codex Spark, MiniMax M2.5
Summary
Google DeepMind is rolling out the upgraded Gemini 3 Deep Think V2 reasoning mode to Google AI Ultra subscribers and opening early access to the Vertex AI / Gemini API for select users. Key benchmark achievements include ARC-AGI-2 at 84.6%, Humanity’s Last Exam (HLE) at 48.4% without tools, and a Codeforces Elo of 3455, showcasing Olympiad-level performance in physics and chemistry. The mode emphasizes practical scientific and engineering applications such as error detection in math papers, physical system modeling, semiconductor optimization, and a sketch to CAD/STL pipeline for 3D printing. ARC benchmark creator François Chollet highlights the benchmark's role in advancing test-time adaptation and fluid intelligence, projecting human-AI parity around 2030. This rollout is framed as a productized, compute-heavy test-time mode rather than a lab demo, with cost disclosures for ARC tasks provided.
Key takeaway
For AI Architects evaluating advanced reasoning and coding models, Gemini 3 Deep Think V2 offers unprecedented performance in scientific and engineering domains, including a sketch-to-CAD/STL pipeline. Its high benchmark scores on ARC-AGI-2 and Codeforces Elo suggest it can tackle complex problems efficiently. Consider its Vertex AI / Gemini API early access for integrating advanced reasoning capabilities into your applications, especially for tasks requiring Olympiad-level problem-solving or agentic coding.
Key insights
Advanced AI models like Gemini 3 Deep Think are achieving human-level performance in complex reasoning and coding tasks.
Principles
- Test-time adaptation is crucial for advancing fluid intelligence in AI.
- AI models are becoming cost-effective for high-volume production workloads.
Method
Google's Gemini 3 Deep Think V2 utilizes a compute-heavy, productized reasoning mode for scientific and engineering problem-solving, including a sketch-to-CAD/STL pipeline.
In practice
- Explore Gemini 3 Deep Think for complex scientific and engineering tasks.
- Evaluate MiniMax M2.5 or GLM-5 for cost-effective coding agent workflows.
Topics
- Gemini 3 Deep Think
- AI Code Generation
- Large Language Models
- AI Agent Frameworks
- AI Infrastructure
Code references
- pytorch/ao
- flashinfer-ai/flashinfer-bench
- traceopt-ai/traceml
- VincentKaufmann/noapi-google-search-mcp
- ggml-org/llama.cpp
Best for: Investor, Director of AI/ML, AI Architect, AI Engineer, AI Product Manager, Tech Journalist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by AINews.