Gemini 3.1 Flash-Lite
Summary
Google has released Gemini 3.1 Flash-Lite, an update to its Flash-Lite model family, on March 3, 2026. This new iteration is significantly more cost-effective than Gemini 3.1 Pro, priced at $0.25 per million input tokens and $1.5 per million output tokens, making it one-eighth the price of its Pro counterpart. The model features support for four distinct "thinking levels," demonstrated through its ability to generate varied outputs, such as four different pelican illustrations, each reflecting a different level of detail or complexity.
Key takeaway
For MLOps Engineers managing inference costs, Gemini 3.1 Flash-Lite presents a compelling option due to its significantly lower pricing compared to Gemini 3.1 Pro. You should evaluate its performance across its four thinking levels for tasks where cost-efficiency is paramount, potentially integrating it for high-volume, lower-complexity generative applications to optimize operational expenses.
Key insights
Gemini 3.1 Flash-Lite offers a cost-effective alternative with adjustable thinking levels.
Principles
- Cost-efficiency drives model adoption.
- Tiered intelligence levels enhance flexibility.
In practice
- Utilize Flash-Lite for budget-sensitive tasks.
- Experiment with thinking levels for varied outputs.
Topics
- Gemini 3.1 Flash-Lite
- Large Language Models
- Model Pricing
- Generative AI
- Multimodal AI
Best for: CTO, Director of AI/ML, MLOps Engineer, AI Engineer, Machine Learning Engineer, AI Product Manager
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Simon Willison's Weblog.