AdaTok: Self-Budgeting Image Tokenization with Quality-Preserving Dynamic Tokens

2026-06-08 · Source: cs.CV updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision · Depth: Expert, extended

Summary

AdaTok is a novel self-budgeting discrete 1D image tokenizer designed to address the inefficiency of fixed-length tokenization for images with heterogeneous visual complexity. It introduces a representation-allocation co-design, ensuring prefixes remain decodable across varying budgets and the tokenizer learns the optimal prefix length per image. AdaTok comprises two modules: Prioritized Representation Learning (PRL), which uses nested tail masking and Multi-Head LoRA (MH-LoRA) to order tokens and resolve semantic shift, and Adaptive Token Allocation (ATA), which employs a lightweight deterministic-group GRPO policy with Dynamic Pareto Weighting to select content-adaptive token budgets. On ImageNet-1K, AdaTok-Full achieves an rFID of 1.31 at 256 tokens, while AdaTok-Adaptive attains rFID 1.50 using only ~118 tokens on average, outperforming discrete 1D baselines. This adaptive approach also yields approximately 2.1x throughput in autoregressive image generation compared to a fixed 256-token decode.

Key takeaway

For Machine Learning Engineers building image generation systems, AdaTok offers a significant efficiency improvement. If you are constrained by fixed token budgets, AdaTok's self-budgeting approach allows you to achieve comparable or better reconstruction quality with fewer tokens. This translates to approximately 2.1x faster autoregressive generation throughput. Consider integrating AdaTok to dynamically optimize token usage, reducing computational costs without sacrificing visual fidelity in your models.

Key insights

AdaTok enables self-budgeting image tokenization by co-designing representation and allocation for dynamic, quality-preserving token lengths.

Principles

Visual complexity demands per-instance token allocation.
Actionable elasticity requires representation-allocation co-design.
Budget-dependent semantic shift needs specialized decoders.

Method

AdaTok uses Prioritized Representation Learning with Nested Tail Masking and Multi-Head LoRA, coupled with Adaptive Token Allocation via deterministic-group GRPO and Dynamic Pareto Weighting for self-budgeting.

In practice

Use AdaTok for efficient image generation with variable token lengths.
Implement MH-LoRA to handle semantic shifts in adaptive decoders.
Apply GRPO with DPW for multi-objective policy optimization.

Topics

Image Tokenization
Adaptive Token Allocation
Group Relative Policy Optimization
Multi-Head LoRA
Autoregressive Image Generation
Rate-Distortion Optimization

Best for: Computer Vision Engineer, AI Scientist, Research Scientist, Machine Learning Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CV updates on arXiv.org.