Gemini 3.5 Flash: more expensive, but Google plan to use it for everything

· Source: Simon Willison's Weblog · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Emerging Technologies & Innovation · Depth: Intermediate, quick

Summary

Google launched Gemini 3.5 Flash on May 19, 2026, making it generally available without a preview phase. This model, identified as "gemini-3.5-flash", is being integrated across numerous Google products, including the Gemini app, Google Search, Google Antigravity, and Gemini Enterprise. It features a January 2025 knowledge cut-off and supports 1,048,576 input tokens and 65,536 maximum output tokens. A new Interactions API, currently in beta, also accompanies the release, offering server-side history management. Notably, Gemini 3.5 Flash comes with a significant price increase, costing \$1.50 per million input tokens and \$9 per million output tokens, which is 3x the price of Gemini 3 Flash Preview and 6x that of Gemini 3.1 Flash-Lite. This pricing positions it close to Gemini 3.1 Pro and reflects a broader industry trend of rising LLM costs, as evidenced by benchmark costs like \$1,551.60 for Gemini 3.5 Flash (high) compared to \$892.28 for Gemini 3.1 Pro Preview.

Key takeaway

For AI Engineers and Directors of AI/ML evaluating LLM deployments, be aware that Gemini 3.5 Flash represents a significant price increase over previous Flash models, costing \$1.50/million input and \$9/million output tokens. This trend, mirrored by other vendors, suggests a new pricing floor for advanced models. You should re-evaluate your cost models and consider benchmark data, like Artificial Analysis's \$1,551.60 for 3.5 Flash, to accurately forecast operational expenses and optimize your LLM strategy.

Key insights

Gemini 3.5 Flash signals a new phase of higher LLM pricing and broad internal deployment across Google's ecosystem.

Principles

In practice

Topics

Best for: CTO, VP of Engineering/Data, Entrepreneur, AI Engineer, Director of AI/ML, AI Product Manager

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Simon Willison's Weblog.