Gemini 3.1 Flash-Lite

· Source: Simon Willison's Weblog · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Emerging Technologies & Innovation · Depth: Fundamental Awareness, quick

Summary

Google has released Gemini 3.1 Flash-Lite, an update to its Flash-Lite model family, on March 3, 2026. This new iteration is significantly more cost-effective than Gemini 3.1 Pro, priced at $0.25 per million input tokens and $1.5 per million output tokens, making it one-eighth the price of its Pro counterpart. The model features support for four distinct "thinking levels," demonstrated through its ability to generate varied outputs, such as four different pelican illustrations, each reflecting a different level of detail or complexity.

Key takeaway

For MLOps Engineers managing inference costs, Gemini 3.1 Flash-Lite presents a compelling option due to its significantly lower pricing compared to Gemini 3.1 Pro. You should evaluate its performance across its four thinking levels for tasks where cost-efficiency is paramount, potentially integrating it for high-volume, lower-complexity generative applications to optimize operational expenses.

Key insights

Gemini 3.1 Flash-Lite offers a cost-effective alternative with adjustable thinking levels.

Principles

In practice

Topics

Best for: CTO, Director of AI/ML, MLOps Engineer, AI Engineer, Machine Learning Engineer, AI Product Manager

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Simon Willison's Weblog.