DCP-Prune: Ultra-Low Token Pruning with Distribution Consistency Preservation

2026-06-15 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision · Depth: Expert, quick

Summary

DCP-Prune is a novel two-stage token pruning framework designed to maintain model performance under ultra-low token budgets, addressing the instability of existing methods. Traditional vision token pruning techniques often suffer significant accuracy degradation when token counts are severely reduced, a problem linked to increased feature distribution shifts. DCP-Prune introduces a lightweight distribution consistency metric to quantify this shift. Its first stage, Anchor-Context Graph Recovery (ACGR), transfers contextual information before token removal. The second stage, Text-Aware Token Cluster Selection (TATCS), dynamically re-selects representative tokens when substantial distribution shifts are detected. Experiments show DCP-Prune achieves superior and more stable performance, notably retaining 92.1% of the upper-bound average performance on LLaVA-1.5-7B using only 16 visual tokens.

Key takeaway

For Machine Learning Engineers optimizing vision models for extreme efficiency, DCP-Prune offers a robust solution for ultra-low token pruning. If your goal is to deploy large vision-language models like LLaVA-1.5-7B on resource-constrained devices, you should investigate this two-stage framework. It enables retaining high performance, specifically 92.1% of upper-bound average, with significantly reduced visual token counts, such as 16 tokens, by actively managing feature distribution consistency.

Key insights

DCP-Prune stabilizes ultra-low token pruning by preserving feature distribution consistency, preventing performance degradation in vision models.

Principles

Feature distribution shift correlates with pruning performance loss.
Transfer contextual information before token removal.
Dynamically re-select tokens to counter distribution shifts.

Method

DCP-Prune employs a two-stage framework: Anchor-Context Graph Recovery (ACGR) transfers contextual information pre-removal, followed by Text-Aware Token Cluster Selection (TATCS) which dynamically re-selects tokens upon detecting severe distribution shifts.

In practice

Achieve 92.1% LLaVA-1.5-7B performance with 16 visual tokens.
Apply distribution consistency metric for pruning.

Topics

Token Pruning
Distribution Consistency
Vision-Language Models
Model Compression
LLaVA-1.5-7B
Ultra-Low Budgets

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.