DCP-Prune: Ultra-Low Token Pruning with Distribution Consistency Preservation
Summary
DCP-Prune is a novel two-stage token pruning framework designed to maintain model performance under ultra-low token budgets, addressing the instability of existing methods. Traditional vision token pruning techniques often suffer significant accuracy degradation when token counts are severely reduced, a problem linked to increased feature distribution shifts. DCP-Prune introduces a lightweight distribution consistency metric to quantify this shift. Its first stage, Anchor-Context Graph Recovery (ACGR), transfers contextual information before token removal. The second stage, Text-Aware Token Cluster Selection (TATCS), dynamically re-selects representative tokens when substantial distribution shifts are detected. Experiments show DCP-Prune achieves superior and more stable performance, notably retaining 92.1% of the upper-bound average performance on LLaVA-1.5-7B using only 16 visual tokens.
Key takeaway
For Machine Learning Engineers optimizing vision models for extreme efficiency, DCP-Prune offers a robust solution for ultra-low token pruning. If your goal is to deploy large vision-language models like LLaVA-1.5-7B on resource-constrained devices, you should investigate this two-stage framework. It enables retaining high performance, specifically 92.1% of upper-bound average, with significantly reduced visual token counts, such as 16 tokens, by actively managing feature distribution consistency.
Key insights
DCP-Prune stabilizes ultra-low token pruning by preserving feature distribution consistency, preventing performance degradation in vision models.
Principles
- Feature distribution shift correlates with pruning performance loss.
- Transfer contextual information before token removal.
- Dynamically re-select tokens to counter distribution shifts.
Method
DCP-Prune employs a two-stage framework: Anchor-Context Graph Recovery (ACGR) transfers contextual information pre-removal, followed by Text-Aware Token Cluster Selection (TATCS) which dynamically re-selects tokens upon detecting severe distribution shifts.
In practice
- Achieve 92.1% LLaVA-1.5-7B performance with 16 visual tokens.
- Apply distribution consistency metric for pruning.
Topics
- Token Pruning
- Distribution Consistency
- Vision-Language Models
- Model Compression
- LLaVA-1.5-7B
- Ultra-Low Budgets
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.