Echo2ECG: Enhancing ECG Representations with Cardiac Morphology from Multi-View Echos

2025-08-01 · Source: cs.AI updates on arXiv.org · Field: Health & Wellbeing — Clinical Care & Medical Practice, Medical Devices & Health Technology, Health & Medical Research · Depth: Expert, long

Summary

Echo2ECG is a novel multimodal self-supervised learning framework designed to enrich electrocardiography (ECG) representations with comprehensive cardiac morphological data from multi-view echocardiography (Echo) studies. This approach resolves the representational mismatch found in prior methods that align global ECG signals with spatially restricted single-view Echos. The framework, comprising an ECG encoder initialized from OTiS and an Echo encoder from EchoPrime, leverages attention pooling to aggregate multi-view Echo embeddings. With 12.5M trainable parameters, Echo2ECG consistently outperforms state-of-the-art unimodal and multimodal baselines across two critical tasks: structural cardiac phenotype classification (LVEF and SHD) on three datasets (internal, EchoNext, UK Biobank) and phenotype-aware cross-modal Echo retrieval. Remarkably, Echo2ECG achieves superior diagnostic utility while being 18x smaller than the largest competing model, EchoingECG.

Key takeaway

For machine learning engineers developing cardiac AI models, Echo2ECG offers a superior approach to integrate morphological data into ECG representations. You should consider adopting multi-view Echo alignment over single-view methods to overcome representational mismatches and improve diagnostic accuracy for conditions like LVEF and SHD. This framework enables robust feature extraction even with limited training data, potentially reducing computational overhead and accelerating model deployment in clinical settings.

Key insights

Aligning global ECGs with multi-view Echos resolves representational mismatch, yielding powerful, lightweight cardiac morphology features.

Principles

Multi-view imaging improves cross-modal representation alignment.
Global electrical signals require comprehensive anatomical context.
Self-supervised learning builds robust, transferable features.

Method

Echo2ECG uses contrastive learning to align ECG and aggregated multi-view Echo embeddings in a shared latent space. It optimizes an ECG encoder, Echo view aggregator, and projection layers, keeping the Echo encoder frozen.

In practice

Use attention pooling for multi-view data aggregation.
Initialize encoders with strong pre-trained unimodal models.
Pre-train on paired ECG-Echo studies within 7 days.

Topics

Echo2ECG
Multimodal Self-Supervised Learning
ECG Feature Extraction
Cardiac Morphology
Multi-View Echocardiography
LVEF Classification

Code references

michelleespranita/Echo2ECG

Best for: AI Scientist, Machine Learning Engineer, Research Scientist

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.