MARCUS: An agentic, multimodal vision-language model for cardiac diagnosis and management
Summary
MARCUS (Multimodal Autonomous Reasoning and Chat for Ultrasound and Signals) is an agentic vision-language system designed for comprehensive interpretation of cardiac tests, including electrocardiograms (ECGs), echocardiograms, and cardiac magnetic resonance imaging (CMR). This system processes single-modality inputs and integrates multimodal data, addressing limitations of current AI models. MARCUS utilizes a hierarchical agentic architecture with modality-specific vision-language expert models, each featuring domain-trained visual encoders and multi-stage language model optimization, all coordinated by a multimodal orchestrator. Trained on 13.5 million images and a 1.6 million question expert-curated dataset, MARCUS achieved state-of-the-art performance, surpassing frontier models like GPT-5 Thinking and Gemini 2.5 Pro Deep Think. It demonstrated 87-91% accuracy for ECG, 67-86% for echocardiography, and 85-88% for CMR, outperforming frontier models by 34-45%. For multimodal cases, MARCUS achieved 70% accuracy, nearly tripling frontier models' 22-28%, and showed 1.7-3.0x higher free-text quality scores. The agentic design also provides resistance to "mirage reasoning."
Key takeaway
For AI Scientists developing diagnostic tools in cardiology, MARCUS demonstrates that an agentic, multimodal approach significantly improves accuracy and reasoning quality over current frontier models. You should consider adopting hierarchical agentic architectures with domain-specific visual encoders to enhance the reliability and performance of your medical AI systems, especially when integrating diverse data types. This approach can lead to more robust and clinically useful diagnostic support.
Key insights
MARCUS is an agentic, multimodal vision-language model excelling in cardiac diagnosis by integrating diverse imaging data.
Principles
- Agentic architecture enhances multimodal interpretation.
- Domain-specific encoders improve diagnostic accuracy.
- Multimodal integration outperforms single-modality analysis.
Method
MARCUS employs a hierarchical agentic architecture with modality-specific vision-language expert models, integrating domain-trained visual encoders and multi-stage language model optimization, coordinated by a multimodal orchestrator.
In practice
- Interpret ECGs, echocardiograms, and CMR images.
- Combine multiple cardiac imaging modalities.
- Reduce "mirage reasoning" in diagnostic AI.
Topics
- MARCUS
- Multimodal AI
- Cardiac Diagnosis
- Vision-Language Models
- Agentic AI
Best for: AI Scientist, AI Researcher, AI Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.