Gemini 3.1 Flash-Lite

2026-03-03 · Source: Simon Willison's Weblog · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Emerging Technologies & Innovation · Depth: Fundamental Awareness, quick

Summary

Google has released Gemini 3.1 Flash-Lite, an update to its Flash-Lite model family, on March 3, 2026. This new iteration is significantly more cost-effective than Gemini 3.1 Pro, priced at $0.25 per million input tokens and $1.5 per million output tokens, making it one-eighth the price of its Pro counterpart. The model features support for four distinct "thinking levels," demonstrated through its ability to generate varied outputs, such as four different pelican illustrations, each reflecting a different level of detail or complexity.

Key takeaway

For MLOps Engineers managing inference costs, Gemini 3.1 Flash-Lite presents a compelling option due to its significantly lower pricing compared to Gemini 3.1 Pro. You should evaluate its performance across its four thinking levels for tasks where cost-efficiency is paramount, potentially integrating it for high-volume, lower-complexity generative applications to optimize operational expenses.

Key insights

Gemini 3.1 Flash-Lite offers a cost-effective alternative with adjustable thinking levels.

Principles

Cost-efficiency drives model adoption.
Tiered intelligence levels enhance flexibility.

In practice

Utilize Flash-Lite for budget-sensitive tasks.
Experiment with thinking levels for varied outputs.

Topics

Gemini 3.1 Flash-Lite
Large Language Models
Model Pricing
Generative AI
Multimodal AI

Best for: CTO, Director of AI/ML, MLOps Engineer, AI Engineer, Machine Learning Engineer, AI Product Manager

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Simon Willison's Weblog.