AIM: Asymmetric Information Masking for Visual Question Answering Continual Learning

· Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

A new method called Asymmetric Information Masking (AIM) has been developed to address catastrophic forgetting in continual Visual Question Answering (VQA) for modern Vision-Language Models (VLMs). Existing Continual Learning (CL) methods are designed for symmetric, unimodal architectures, which conflicts with the inherently asymmetric trainable components of VLMs. This mismatch causes standard global regularization to prioritize the large language decoder, leaving the smaller visual projection layers susceptible to interference and leading to a loss of compositional reasoning. AIM mitigates this by applying targeted masks based on modality-specific sensitivity to balance stability and plasticity. Experiments on VQA v2 and GQA datasets demonstrate that AIM achieves state-of-the-art performance in Average Performance (AP) and Average Forgetting (AF), while also improving generalization to novel skill-concept compositions.

Key takeaway

For research scientists developing continual learning strategies for Vision-Language Models, AIM offers a critical solution to catastrophic forgetting. By addressing the inherent architectural asymmetry, AIM significantly improves performance and generalization on VQA tasks. You should consider integrating AIM's modality-specific masking approach to enhance the stability and plasticity of your VLM training pipelines, particularly when dealing with continuous data streams.

Key insights

Asymmetric Information Masking (AIM) prevents catastrophic forgetting in VQA by balancing VLM stability and plasticity.

Principles

Method

AIM applies targeted masks based on modality-specific sensitivity to balance stability and plasticity in asymmetric VLM architectures during continual learning.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.