Gemini 3.5 Flash: more expensive, but Google plan to use it for everything
Summary
Google launched Gemini 3.5 Flash on May 19, 2026, making it generally available without a preview phase. This model, identified as "gemini-3.5-flash", is being integrated across numerous Google products, including the Gemini app, Google Search, Google Antigravity, and Gemini Enterprise. It features a January 2025 knowledge cut-off and supports 1,048,576 input tokens and 65,536 maximum output tokens. A new Interactions API, currently in beta, also accompanies the release, offering server-side history management. Notably, Gemini 3.5 Flash comes with a significant price increase, costing \$1.50 per million input tokens and \$9 per million output tokens, which is 3x the price of Gemini 3 Flash Preview and 6x that of Gemini 3.1 Flash-Lite. This pricing positions it close to Gemini 3.1 Pro and reflects a broader industry trend of rising LLM costs, as evidenced by benchmark costs like \$1,551.60 for Gemini 3.5 Flash (high) compared to \$892.28 for Gemini 3.1 Pro Preview.
Key takeaway
For AI Engineers and Directors of AI/ML evaluating LLM deployments, be aware that Gemini 3.5 Flash represents a significant price increase over previous Flash models, costing \$1.50/million input and \$9/million output tokens. This trend, mirrored by other vendors, suggests a new pricing floor for advanced models. You should re-evaluate your cost models and consider benchmark data, like Artificial Analysis's \$1,551.60 for 3.5 Flash, to accurately forecast operational expenses and optimize your LLM strategy.
Key insights
Gemini 3.5 Flash signals a new phase of higher LLM pricing and broad internal deployment across Google's ecosystem.
Principles
- LLM API costs are increasing across major providers.
- Internal product integration often precedes wider API rollout.
- Benchmark costs reveal true operational expense.
In practice
- Use benchmark data for LLM cost comparisons.
- Assess new LLM versions for price-performance.
- Track API pricing trends for budget planning.
Topics
- Gemini 3.5 Flash
- LLM Pricing
- Google AI
- API Costs
- Benchmark Data
- Interactions API
Best for: CTO, VP of Engineering/Data, Entrepreneur, AI Engineer, Director of AI/ML, AI Product Manager
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Simon Willison's Weblog.