UniECG: Understanding and Generating ECG in One Unified Model

2026-06-18 · Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Medical Devices & Health Technology · Depth: Expert, long

Summary

UniECG is introduced as the first unified model for electrocardiogram (ECG) analysis, uniquely capable of both evidence-based ECG interpretation and text-conditioned ECG generation. Unlike existing unified models like GPT-5, which struggle with accurate medical diagnoses and signal generation, UniECG employs a decoupled two-stage training strategy. Initially, it learns ECG-to-Text interpretation through full-parameter fine-tuning on the ECG-Grounding dataset. Subsequently, it integrates Text-to-ECG generation capabilities by aligning special [ECG] tokens with the DiffuSETS text encoder in latent space, utilizing textual reports from MIMIC-IV-ECG. This dual functionality allows UniECG to autonomously interpret ECG data or generate high-fidelity, signal-level ECGs from text, including specific attributes like heart rate, sex, and age. While showing a slight performance decrease in understanding compared to GEM, UniECG significantly improves metrics over PULSE, with ECG feature grounding increasing from 41.63 to 69.44 and overall average from 39.95 to 63.98, while adding unique generative capabilities.

Key takeaway

For AI Scientists and Machine Learning Engineers developing medical diagnostic tools, UniECG offers a compelling architecture for multimodal AI. You should consider its two-stage training approach to build models capable of both interpreting complex medical signals like ECGs and generating high-fidelity, text-conditioned data. This dual capability can significantly enhance data augmentation for imbalanced datasets and provide more interactive, evidence-based diagnostic support, expanding the utility of your medical AI systems.

Key insights

UniECG unifies evidence-based ECG interpretation and text-conditioned signal generation in a single model.

Principles

Unified models can extend beyond vision-language to medical signals like ECG.
Decoupling complex tasks into stages can preserve initial capabilities while adding new ones.
Latent space alignment can inject new modalities into pre-trained LLMs.

Method

A two-stage training process first fine-tunes an LLM for ECG-to-Text interpretation, then injects Text-to-ECG generation by extending the LLM vocabulary with special [ECG] tokens and aligning their latent representations with a diffusion model's text encoder.

In practice

Implement a two-stage training strategy for combining medical signal interpretation and generation.
Utilize special tokens and latent space alignment to integrate signal synthesis into LLMs.
Employ diffusion models for high-fidelity, text-conditioned medical signal generation.

Topics

ECG Interpretation
ECG Generation
Unified Models
Multimodal LLMs
Medical AI
Diffusion Models

Code references

PKUDigitalHealth/UniECG

Best for: AI Scientist, Machine Learning Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.