Scaling Recurrence-aware Foundation Models for Clinical Records via Next-Visit Prediction

· Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Health & Medical Research · Depth: Advanced, quick

Summary

RAVEN (Recurrence-Aware next-Visit EveNt prediction) is a new generative pretraining strategy designed for sequential electronic health record (EHR) data. This model, trained on over one million unique individuals, autoregressively generates tokenized clinical events for a patient's next visit, conditioned on their historical data. A key feature is its regularization for predicting repeated events, addressing a common issue in EHR foundation model evaluations where repeated event tokens can inflate performance metrics. The research also explores scaling behaviors in data-constrained, compute-saturated environments, indicating that increasing model size alone is ineffective without proportional data increases. RAVEN demonstrates zero-shot prediction capabilities for disease incidence, performing comparably to fine-tuned Transformer models and surpassing simulation-based next-token methods, and generalizes to external patient cohorts despite clinical code mapping and feature coverage discrepancies.

Key takeaway

For research scientists developing predictive models with electronic health records, you should consider integrating recurrence-aware regularization into your generative pretraining strategies. This approach, exemplified by RAVEN, can improve the accuracy of next-visit event predictions and prevent inflated performance metrics by distinguishing new disease onsets from recurring events, especially when evaluating zero-shot prediction capabilities.

Key insights

RAVEN is a generative pretraining strategy for EHRs that predicts next-visit events while accounting for recurrence.

Principles

Method

RAVEN autoregressively generates tokenized clinical events for a patient's next visit, conditioned on history, with regularization for repeated events to prevent inflated performance metrics.

In practice

Topics

Best for: Research Scientist, AI Researcher, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.