How to Count Gemini Tokens Locally

· Source: HackerNoon · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Intermediate, extended

Summary

This article details how to count Gemini model tokens both locally and through the API, using the "google-genai" Python SDK version 2.9.0 or newer. It introduces the LocalTokenizer class for offline text token estimation, demonstrating its use with the "gemini-3.1-flash-lite" model. The content explains multimodal tokenization math: images consume up to 280, 560, 1120, or 2240 tokens per image based on resolution; audio uses 25 tokens per second; video combines audio tokens with frame tokens (default 1 FPS, 70 tokens/frame for low/medium resolution, 280 for high); and PDFs use 280, 560, or 1120 tokens per page. Benefits of local token counting include offline operation, reduced API calls, lower latency, cost control, and enhanced privacy. For precise billing, the article advises relying on "usage_metadata" from API responses.

Key takeaway

For AI Engineers managing Gemini model deployments, integrating local token counting is crucial for optimizing costs and performance. You should implement the LocalTokenizer from the "google-genai" SDK to pre-validate prompt sizes offline, reducing API latency and avoiding rate limits. This approach enables more precise budget forecasting and allows for dynamic routing of requests based on input token volume, especially for multimodal content, ensuring efficient resource utilization and data privacy.

Key insights

Local tokenization for Gemini models offers offline estimation, cost control, and privacy benefits for multimodal inputs.

Principles

Method

Install "google-genai[local-tokenizer]>=2.9.0", initialize LocalTokenizer with "model_name", then call "tokenizer.count_tokens(contents)" for local estimation.

In practice

Topics

Best for: AI Engineer, Machine Learning Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by HackerNoon.