Decoding the Multimodal Maze: A Systematic Review on the Adoption of Explainability in Multimodal Attention-based Models

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, extended

Summary

This systematic literature review, covering research from January 2020 to early 2024, analyzes the adoption of explainability in multimodal attention-based models. It reveals that most studies focus on vision-language and language-only models, primarily employing attention-based techniques for explanation. A critical finding is the non-systematic nature of XAI evaluation methods in multimodal settings, which often lack consistency, robustness, and consideration for modality-specific cognitive factors. The review provides comprehensive recommendations to foster rigorous, transparent, and standardized evaluation and reporting practices, aiming to support the development of more interpretable, accountable, and responsible multimodal AI systems.

Key takeaway

For AI Scientists and Research Scientists developing multimodal attention-based models, you should prioritize integrating explainability as a core design objective, not an afterthought. Focus on developing and adopting standardized, cognition- and domain-aware evaluation metrics that explicitly quantify inter-modal interactions, moving beyond ad hoc qualitative analyses. This will ensure your models are not only performant but also transparent, trustworthy, and align with increasing regulatory requirements.

Key insights

Multimodal XAI evaluation lacks standardization, hindering interpretable and responsible AI development.

Principles

Method

The review proposes a comprehensive guideline for advancing standardized practices in incorporating explainability into multimodal models and streamlining their evaluation across tasks and domains.

In practice

Topics

Best for: AI Scientist, Research Scientist, AI Ethicist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.