EyeMVP: OCT-Informed Fundus Representation Learning via Paired CFP--OCT Pretraining

2026-06-13 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Medical Imaging AI · Depth: Expert, quick

Summary

EyeMVP is a novel cross-modal retinal foundation model designed to enhance color fundus photography (CFP) representations with depth-resolved information from optical coherence tomography (OCT). It was pretrained on 674,893 same-eye same-day paired CFP-OCT image triples from 112,642 patients across eight hospitals in China. The model employs cross-modal masked reconstruction, using source-constrained cross-attention and CFP-derived structural masks to align non-aligned imaging geometries. EyeMVP requires only CFP images for inference, making it suitable for screening. Across 16 downstream tasks, including classification and segmentation, EyeMVP consistently outperformed other retinal foundation models, particularly on tasks involving macular and optic nerve structure. For CFP-challenging macular diseases, it achieved an AUROC of 0.948 for macular edema, significantly higher than EyeCLIP's 0.852, and 0.825 for myopic macular schisis. An exploratory reader study indicated EyeMVP surpassed junior and intermediate ophthalmologists on macular edema and showed numerically higher balanced accuracy than all reader groups on myopic macular schisis.

Key takeaway

For AI Scientists developing diagnostic tools for ophthalmology, EyeMVP demonstrates a powerful method to enhance unimodal screening. Its approach of integrating OCT-informed representations into CFP models significantly boosts diagnostic accuracy for complex macular diseases like macular edema and myopic macular schisis. You should explore cross-modal masked reconstruction and source-constrained cross-attention to improve existing CFP-based models, particularly for conditions requiring depth information. This offers a practical route to stronger, more accurate retinal analysis in screening settings.

Key insights

EyeMVP uses paired CFP-OCT pretraining to enrich CFP representations with OCT depth information for improved retinal analysis.

Principles

Pixel-level cross-modal reconstruction enriches CFP with OCT supervision.
Cross-attention with structural masks accommodates non-aligned imaging.
Paired CFP-OCT pretraining improves performance on macular/optic nerve tasks.

Method

EyeMVP uses cross-modal masked reconstruction with source-constrained cross-attention and CFP-derived structural masks, pretrained on paired CFP-OCT image triples. It requires only CFP for inference.

In practice

Enhance CFP-based retinal analysis in screening settings.
Improve diagnosis of CFP-challenging macular diseases.
Potentially aid junior ophthalmologists in specific diagnoses.

Topics

EyeMVP
Cross-modal Learning
Retinal Screening
Optical Coherence Tomography
Color Fundus Photography
Macular Disease Diagnosis

Best for: Computer Vision Engineer, AI Scientist, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.