AIM: Asymmetric Information Masking for Visual Question Answering Continual Learning
Summary
A new method called Asymmetric Information Masking (AIM) has been developed to address catastrophic forgetting in continual Visual Question Answering (VQA) for modern Vision-Language Models (VLMs). Existing Continual Learning (CL) methods are designed for symmetric, unimodal architectures, which conflicts with the inherently asymmetric trainable components of VLMs. This mismatch causes standard global regularization to prioritize the large language decoder, leaving the smaller visual projection layers susceptible to interference and leading to a loss of compositional reasoning. AIM mitigates this by applying targeted masks based on modality-specific sensitivity to balance stability and plasticity. Experiments on VQA v2 and GQA datasets demonstrate that AIM achieves state-of-the-art performance in Average Performance (AP) and Average Forgetting (AF), while also improving generalization to novel skill-concept compositions.
Key takeaway
For research scientists developing continual learning strategies for Vision-Language Models, AIM offers a critical solution to catastrophic forgetting. By addressing the inherent architectural asymmetry, AIM significantly improves performance and generalization on VQA tasks. You should consider integrating AIM's modality-specific masking approach to enhance the stability and plasticity of your VLM training pipelines, particularly when dealing with continuous data streams.
Key insights
Asymmetric Information Masking (AIM) prevents catastrophic forgetting in VQA by balancing VLM stability and plasticity.
Principles
- VLM asymmetry causes catastrophic forgetting.
- Modality-specific masking balances stability and plasticity.
Method
AIM applies targeted masks based on modality-specific sensitivity to balance stability and plasticity in asymmetric VLM architectures during continual learning.
In practice
- Apply AIM to VQA models for continual learning.
- Use AIM to improve generalization in VLMs.
Topics
- Visual Question Answering
- Continual Learning
- Vision-Language Models
- Asymmetric Information Masking
- Catastrophic Forgetting
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.