DCP-Prune: Ultra-Low Token Pruning with Distribution Consistency Preservation

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision · Depth: Expert, quick

Summary

DCP-Prune is a novel two-stage token pruning framework designed to maintain model performance under ultra-low token budgets, addressing the instability of existing methods. Traditional vision token pruning techniques often suffer significant accuracy degradation when token counts are severely reduced, a problem linked to increased feature distribution shifts. DCP-Prune introduces a lightweight distribution consistency metric to quantify this shift. Its first stage, Anchor-Context Graph Recovery (ACGR), transfers contextual information before token removal. The second stage, Text-Aware Token Cluster Selection (TATCS), dynamically re-selects representative tokens when substantial distribution shifts are detected. Experiments show DCP-Prune achieves superior and more stable performance, notably retaining 92.1% of the upper-bound average performance on LLaVA-1.5-7B using only 16 visual tokens.

Key takeaway

For Machine Learning Engineers optimizing vision models for extreme efficiency, DCP-Prune offers a robust solution for ultra-low token pruning. If your goal is to deploy large vision-language models like LLaVA-1.5-7B on resource-constrained devices, you should investigate this two-stage framework. It enables retaining high performance, specifically 92.1% of upper-bound average, with significantly reduced visual token counts, such as 16 tokens, by actively managing feature distribution consistency.

Key insights

DCP-Prune stabilizes ultra-low token pruning by preserving feature distribution consistency, preventing performance degradation in vision models.

Principles

Method

DCP-Prune employs a two-stage framework: Anchor-Context Graph Recovery (ACGR) transfers contextual information pre-removal, followed by Text-Aware Token Cluster Selection (TATCS) which dynamically re-selects tokens upon detecting severe distribution shifts.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.