AdaTok: Self-Budgeting Image Tokenization with Quality-Preserving Dynamic Tokens

· Source: cs.CV updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision · Depth: Expert, extended

Summary

AdaTok is a novel self-budgeting discrete 1D image tokenizer designed to address the inefficiency of fixed-length tokenization for images with heterogeneous visual complexity. It introduces a representation-allocation co-design, ensuring prefixes remain decodable across varying budgets and the tokenizer learns the optimal prefix length per image. AdaTok comprises two modules: Prioritized Representation Learning (PRL), which uses nested tail masking and Multi-Head LoRA (MH-LoRA) to order tokens and resolve semantic shift, and Adaptive Token Allocation (ATA), which employs a lightweight deterministic-group GRPO policy with Dynamic Pareto Weighting to select content-adaptive token budgets. On ImageNet-1K, AdaTok-Full achieves an rFID of 1.31 at 256 tokens, while AdaTok-Adaptive attains rFID 1.50 using only ~118 tokens on average, outperforming discrete 1D baselines. This adaptive approach also yields approximately 2.1x throughput in autoregressive image generation compared to a fixed 256-token decode.

Key takeaway

For Machine Learning Engineers building image generation systems, AdaTok offers a significant efficiency improvement. If you are constrained by fixed token budgets, AdaTok's self-budgeting approach allows you to achieve comparable or better reconstruction quality with fewer tokens. This translates to approximately 2.1x faster autoregressive generation throughput. Consider integrating AdaTok to dynamically optimize token usage, reducing computational costs without sacrificing visual fidelity in your models.

Key insights

AdaTok enables self-budgeting image tokenization by co-designing representation and allocation for dynamic, quality-preserving token lengths.

Principles

Method

AdaTok uses Prioritized Representation Learning with Nested Tail Masking and Multi-Head LoRA, coupled with Adaptive Token Allocation via deterministic-group GRPO and Dynamic Pareto Weighting for self-budgeting.

In practice

Topics

Best for: Computer Vision Engineer, AI Scientist, Research Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CV updates on arXiv.org.