TOPS: First-Principles Visual Token Pruning via Constructing Token Optimal Preservation Sets for Efficient MLLM Inference

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

TOPS, a novel visual token pruning module, addresses the computational overhead in Multimodal Large Language Models (MLLMs) caused by numerous visual tokens. Existing pruning methods often fall short, either retaining redundant tokens or being instruction-agnostic. This research re-conceptualizes visual token pruning from first principles, formulating it as constructing Token Optimal Preservation Sets through a top-down information-theoretic analysis. TOPS identifies three core principles for effective token selection: Task Relevance, Information Coverage, and Semantic Diversity. The proposed module is training-free and model-agnostic, demonstrating superior performance across 7 MLLM backbones and 14 benchmarks. Notably, on LLaVA-NeXT, TOPS removes 77.8% of visual tokens while preserving 100.0% and 100.6% performance on its 7B and 13B models, respectively, suggesting potential for hallucination mitigation and lightweight MLLM design.

Key takeaway

For Machine Learning Engineers optimizing MLLM inference efficiency, TOPS provides a principled, training-free solution to drastically reduce visual tokens. You can remove up to 77.8% of visual tokens on models like LLaVA-NeXT 7B and 13B while preserving 100.0% and 100.6% performance, respectively. Integrate TOPS to enhance efficiency, mitigate hallucination, and enable more lightweight MLLM designs.

Key insights

TOPS formulates visual token pruning from first principles, constructing Token Optimal Preservation Sets for efficient MLLM inference.

Principles

Method

TOPS is a training-free, model-agnostic pruning module that applies a top-down information-theoretic analysis to construct Token Optimal Preservation Sets based on three fundamental principles.

In practice

Topics

Best for: Research Scientist, AI Engineer, Computer Vision Engineer, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.