Google's Gemini 3.5 Flash follows Anthropic and OpenAI in making newer AI models significantly pricier
Summary
Google Deepmind has released Gemini 3.5 Flash, a new AI model that delivers over 280 output tokens per second, making it the fastest in its intelligence class. However, it comes with a significant cost increase, operating at 5.5 times the cost of its predecessor, Gemini 3 Flash. Token prices have tripled to \$1.50 per million input tokens and \$9.00 per million output tokens. Despite lower per-token rates than Gemini 3.1 Pro, its high token consumption on agent tasks leads to total benchmark costs 75 percent higher than the Pro model. Gemini 3.5 Flash shows strong improvements in agentic and multimodal tasks, achieving an Elo score of 1,656 on GDPval-AA and an 84 percent score on MMMU-Pro. Its hallucination rate dropped to 61 percent, though it still trails top competitors. A notable weakness is in programming, where it scores 45 on the Artificial Analysis Coding Index, falling behind models like GPT-5.5 and Claude Opus 4.7. This release reflects an industry trend of rising AI model costs driven by complex, multi-step agentic tasks.
Key takeaway
For AI Engineers and Directors of AI/ML evaluating model deployments, you must shift focus from raw token prices to total task efficiency. Gemini 3.5 Flash's higher token consumption for agentic tasks means its overall cost can exceed seemingly pricier Pro models. You should rigorously benchmark models against your specific workloads to understand true operational expenses and ensure ROI, especially for coding or knowledge work where gains are harder to quantify.
Key insights
Newer, more capable AI models like Gemini 3.5 Flash are significantly pricier due to increased token consumption for complex agentic tasks.
Principles
- Raw token price is a misleading cost metric.
- Efficiency (tokens per job) defines true cost.
- Agentic tasks drive higher interaction costs.
In practice
- Evaluate AI model costs based on total task efficiency.
- Prioritize models strong in specific use cases like multimodal.
- Consider older, cheaper models for simpler tasks.
Topics
- Gemini 3.5 Flash
- AI Model Pricing
- Agentic AI
- Multimodal AI
- Hallucination Rates
- AI Cost Efficiency
- LLM Benchmarking
Best for: CTO, VP of Engineering/Data, AI Architect, AI Engineer, Machine Learning Engineer, Director of AI/ML
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by The Decoder.