AdaTok: Self-Budgeting Image Tokenization with Quality-Preserving Dynamic Tokens
Summary
AdaTok is a novel self-budgeting discrete 1D image tokenizer designed to address the inefficiency of fixed-length tokenization for images with heterogeneous visual complexity. It introduces a representation-allocation co-design, ensuring prefixes remain decodable across varying budgets and the tokenizer learns the optimal prefix length per image. AdaTok comprises two modules: Prioritized Representation Learning (PRL), which uses nested tail masking and Multi-Head LoRA (MH-LoRA) to order tokens and resolve semantic shift, and Adaptive Token Allocation (ATA), which employs a lightweight deterministic-group GRPO policy with Dynamic Pareto Weighting to select content-adaptive token budgets. On ImageNet-1K, AdaTok-Full achieves an rFID of 1.31 at 256 tokens, while AdaTok-Adaptive attains rFID 1.50 using only ~118 tokens on average, outperforming discrete 1D baselines. This adaptive approach also yields approximately 2.1x throughput in autoregressive image generation compared to a fixed 256-token decode.
Key takeaway
For Machine Learning Engineers building image generation systems, AdaTok offers a significant efficiency improvement. If you are constrained by fixed token budgets, AdaTok's self-budgeting approach allows you to achieve comparable or better reconstruction quality with fewer tokens. This translates to approximately 2.1x faster autoregressive generation throughput. Consider integrating AdaTok to dynamically optimize token usage, reducing computational costs without sacrificing visual fidelity in your models.
Key insights
AdaTok enables self-budgeting image tokenization by co-designing representation and allocation for dynamic, quality-preserving token lengths.
Principles
- Visual complexity demands per-instance token allocation.
- Actionable elasticity requires representation-allocation co-design.
- Budget-dependent semantic shift needs specialized decoders.
Method
AdaTok uses Prioritized Representation Learning with Nested Tail Masking and Multi-Head LoRA, coupled with Adaptive Token Allocation via deterministic-group GRPO and Dynamic Pareto Weighting for self-budgeting.
In practice
- Use AdaTok for efficient image generation with variable token lengths.
- Implement MH-LoRA to handle semantic shifts in adaptive decoders.
- Apply GRPO with DPW for multi-objective policy optimization.
Topics
- Image Tokenization
- Adaptive Token Allocation
- Group Relative Policy Optimization
- Multi-Head LoRA
- Autoregressive Image Generation
- Rate-Distortion Optimization
Best for: Computer Vision Engineer, AI Scientist, Research Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CV updates on arXiv.org.