How to Count Gemini Tokens Locally
Summary
This article details how to count Gemini model tokens both locally and through the API, using the "google-genai" Python SDK version 2.9.0 or newer. It introduces the LocalTokenizer class for offline text token estimation, demonstrating its use with the "gemini-3.1-flash-lite" model. The content explains multimodal tokenization math: images consume up to 280, 560, 1120, or 2240 tokens per image based on resolution; audio uses 25 tokens per second; video combines audio tokens with frame tokens (default 1 FPS, 70 tokens/frame for low/medium resolution, 280 for high); and PDFs use 280, 560, or 1120 tokens per page. Benefits of local token counting include offline operation, reduced API calls, lower latency, cost control, and enhanced privacy. For precise billing, the article advises relying on "usage_metadata" from API responses.
Key takeaway
For AI Engineers managing Gemini model deployments, integrating local token counting is crucial for optimizing costs and performance. You should implement the LocalTokenizer from the "google-genai" SDK to pre-validate prompt sizes offline, reducing API latency and avoiding rate limits. This approach enables more precise budget forecasting and allows for dynamic routing of requests based on input token volume, especially for multimodal content, ensuring efficient resource utilization and data privacy.
Key insights
Local tokenization for Gemini models offers offline estimation, cost control, and privacy benefits for multimodal inputs.
Principles
- Tokenization is an information compression codec for LLMs.
- Multimodal inputs are processed by specialized tokenizers with distinct calculation rules.
- API "usage_metadata" is the single source of truth for billing.
Method
Install "google-genai[local-tokenizer]>=2.9.0", initialize LocalTokenizer with "model_name", then call "tokenizer.count_tokens(contents)" for local estimation.
In practice
- Use LocalTokenizer to audit sensitive data token counts offline.
- Route requests to different models based on local token count estimates.
Topics
- Gemini API
- Tokenization
- Multimodal AI
- Local Tokenizer
- Cost Optimization
- Google Gen AI SDK
Best for: AI Engineer, Machine Learning Engineer, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by HackerNoon.