MLUBench: A Benchmark for Lifelong Unlearning Evaluation in MLLMs
Summary
MLUBench is a new, large-scale benchmark designed to evaluate Multimodal Large Language Model (MLLM) lifelong unlearning, a critical problem where models must sequentially remove specific content while preserving general capabilities. Existing benchmarks are limited, failing to capture the cumulative degradation observed in MLLMs. MLUBench features 127 real-world entities across 9 classes, with 5,105 images and 15,414 VQA pairs. Experiments using MLUBench reveal that current unlearning methods suffer severe, cumulative performance degradation and uniquely highlight the challenge of preserving multimodal alignment. To address this, the authors propose LUMoE, a Mixture-of-Experts (MoE) inspired method utilizing switchable Low-Rank Adaptation (LoRA) adapters and a GLM-4V-Plus gate module, which significantly mitigates degradation. The source code and MLUBench dataset are open-sourced.
Key takeaway
For MLOps engineers deploying MLLMs in privacy-sensitive applications, you must account for the severe, cumulative degradation caused by sequential unlearning requests. Traditional methods risk corrupting core model capabilities and multimodal alignment. Consider adopting modular approaches like LUMoE, which uses LoRA adapters and dynamic routing to isolate unlearning tasks, preserving general utility. Evaluate your unlearning strategies rigorously using benchmarks like MLUBench to ensure long-term model stability.
Key insights
MLLM lifelong unlearning uniquely challenges multimodal alignment, requiring isolated, modular solutions to prevent cumulative degradation.
Principles
- Lifelong unlearning causes severe, cumulative performance degradation in MLLMs.
- Preserving multimodal alignment is crucial for MLLM unlearning.
- Isolating unlearning modifications protects base model stability.
Method
LUMoE employs switchable LoRA adapters as "experts" for specific unlearning tasks, with a GLM-4V-Plus gate module dynamically routing multimodal inputs to the appropriate adapter or the original MLLM.
In practice
- Utilize LoRA adapters for task-specific unlearning to isolate changes.
- Implement a gate module to route unlearning requests dynamically.
- Leverage MLUBench for comprehensive MLLM unlearning evaluation.
Topics
- Multimodal LLMs
- Machine Unlearning
- Lifelong Learning
- MLLM Benchmarking
- LoRA
- Mixture-of-Experts
- Data Privacy
Code references
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.