Reroute, Don't Remove: Recoverable Visual Token Routing for Vision-Language Models

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

"Reroute" is a novel, training-free plug-in designed to address the high inference cost of vision-language models (VLMs) by optimizing visual token handling. Unlike conventional rank-and-remove methods that permanently discard visual tokens, Reroute implements a recoverable routing strategy. This approach allows tokens deemed less important at one decoder stage to bypass processing and re-enter the candidate pool for subsequent routing decisions, acknowledging that token relevance can change across decoder depth. Reroute integrates with existing attention-score ranking rules and stage-wise schedules, preserving the theoretical TFLOPs and KV-cache budget of the pruning methods it augments. Evaluated across FastV, PDrop, and Nüwa variants on LLaVA-1.5 and Qwen backbones, Reroute significantly improves grounding performance under aggressive token reduction while maintaining general VQA accuracy. The work was published on 2026-06-10.

Key takeaway

For Machine Learning Engineers optimizing vision-language model inference, you should reconsider irreversible visual token pruning. Instead, integrate recoverable routing solutions like Reroute to manage KV-cache memory and attention computation. This approach allows you to achieve aggressive token reduction without sacrificing grounding performance, especially for sensitive queries, by ensuring potentially relevant tokens can re-enter processing at later decoder stages.

Key insights

Visual token reduction in VLMs should prioritize recoverable routing over irreversible pruning due to dynamic token importance.

Principles

Method

Reroute routes selected visual tokens through decoder blocks while deferring others to bypass and re-enter the candidate pool at subsequent stages, reusing existing ranking rules.

In practice

Topics

Code references

Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.