Scaling Recurrence-aware Foundation Models for Clinical Records via Next-Visit Prediction
Summary
RAVEN (Recurrence-Aware next-Visit EveNt prediction) is a new generative pretraining strategy designed for sequential electronic health record (EHR) data. This model, trained on over one million unique individuals, autoregressively generates tokenized clinical events for a patient's next visit, conditioned on their historical data. A key feature is its regularization for predicting repeated events, addressing a common issue in EHR foundation model evaluations where repeated event tokens can inflate performance metrics. The research also explores scaling behaviors in data-constrained, compute-saturated environments, indicating that increasing model size alone is ineffective without proportional data increases. RAVEN demonstrates zero-shot prediction capabilities for disease incidence, performing comparably to fine-tuned Transformer models and surpassing simulation-based next-token methods, and generalizes to external patient cohorts despite clinical code mapping and feature coverage discrepancies.
Key takeaway
For research scientists developing predictive models with electronic health records, you should consider integrating recurrence-aware regularization into your generative pretraining strategies. This approach, exemplified by RAVEN, can improve the accuracy of next-visit event predictions and prevent inflated performance metrics by distinguishing new disease onsets from recurring events, especially when evaluating zero-shot prediction capabilities.
Key insights
RAVEN is a generative pretraining strategy for EHRs that predicts next-visit events while accounting for recurrence.
Principles
- Regularize repeated event prediction.
- Distinguish new onsets from subsequent occurrences.
- Data volume is critical for model scaling.
Method
RAVEN autoregressively generates tokenized clinical events for a patient's next visit, conditioned on history, with regularization for repeated events to prevent inflated performance metrics.
In practice
- Use RAVEN for zero-shot disease incidence forecasting.
- Apply recurrence regularization in EHR models.
- Prioritize data growth over model size.
Topics
- Generative Pretraining
- Electronic Health Records
- Clinical Event Prediction
- Foundation Models
- Zero-shot Learning
Best for: Research Scientist, AI Researcher, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.