UniECG: Understanding and Generating ECG in One Unified Model
Summary
UniECG is introduced as the first unified model for electrocardiogram (ECG) analysis, uniquely capable of both evidence-based ECG interpretation and text-conditioned ECG generation. Unlike existing unified models like GPT-5, which struggle with accurate medical diagnoses and signal generation, UniECG employs a decoupled two-stage training strategy. Initially, it learns ECG-to-Text interpretation through full-parameter fine-tuning on the ECG-Grounding dataset. Subsequently, it integrates Text-to-ECG generation capabilities by aligning special [ECG] tokens with the DiffuSETS text encoder in latent space, utilizing textual reports from MIMIC-IV-ECG. This dual functionality allows UniECG to autonomously interpret ECG data or generate high-fidelity, signal-level ECGs from text, including specific attributes like heart rate, sex, and age. While showing a slight performance decrease in understanding compared to GEM, UniECG significantly improves metrics over PULSE, with ECG feature grounding increasing from 41.63 to 69.44 and overall average from 39.95 to 63.98, while adding unique generative capabilities.
Key takeaway
For AI Scientists and Machine Learning Engineers developing medical diagnostic tools, UniECG offers a compelling architecture for multimodal AI. You should consider its two-stage training approach to build models capable of both interpreting complex medical signals like ECGs and generating high-fidelity, text-conditioned data. This dual capability can significantly enhance data augmentation for imbalanced datasets and provide more interactive, evidence-based diagnostic support, expanding the utility of your medical AI systems.
Key insights
UniECG unifies evidence-based ECG interpretation and text-conditioned signal generation in a single model.
Principles
- Unified models can extend beyond vision-language to medical signals like ECG.
- Decoupling complex tasks into stages can preserve initial capabilities while adding new ones.
- Latent space alignment can inject new modalities into pre-trained LLMs.
Method
A two-stage training process first fine-tunes an LLM for ECG-to-Text interpretation, then injects Text-to-ECG generation by extending the LLM vocabulary with special [ECG] tokens and aligning their latent representations with a diffusion model's text encoder.
In practice
- Implement a two-stage training strategy for combining medical signal interpretation and generation.
- Utilize special tokens and latent space alignment to integrate signal synthesis into LLMs.
- Employ diffusion models for high-fidelity, text-conditioned medical signal generation.
Topics
- ECG Interpretation
- ECG Generation
- Unified Models
- Multimodal LLMs
- Medical AI
- Diffusion Models
Code references
Best for: AI Scientist, Machine Learning Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.