A three-dimensional multi-modal foundation model for optical coherence tomography
Summary
OCTCube-M is a novel 3D multi-modal foundation model framework designed for integrated analysis of 3D Optical Coherence Tomography (OCT) and 2D en face (EF) retinal images, addressing limitations in existing computational models for retinal disease diagnosis. The framework utilizes COEP, a multi-modal contrastive learning method, to integrate OCT with other imaging modalities like fundus autofluorescence and infrared retinal imaging (IR). Three models were developed: OCTCube (uni-modal), OCTCube-IR (bi-modal), and OCTCube-EF (tri-modal). OCTCube, pre-trained on 26,605 3D OCT volumes (1.62 million 2D slices), achieved state-of-the-art performance in predicting 8 retinal diseases and demonstrated robust generalizability. OCTCube-IR, incorporating 26,685 pairs of OCT and IR images, enabled accurate cross-modality retrieval. OCTCube-EF, trained on over 4 million 2D OCT slices and 400 thousand EF images, excels in predicting geographic atrophy growth rates across 6 multi-center clinical trials in 23 countries.
Key takeaway
For computer vision engineers developing diagnostic tools for retinal diseases, OCTCube-M offers a robust framework for integrating diverse imaging modalities. You should consider leveraging its 3D foundation model architecture and multi-modal contrastive learning approach to improve diagnostic accuracy and prognostic capabilities, particularly for conditions like geographic atrophy. Explore the publicly available OCTCube model and code to accelerate your development of generalizable AI solutions in ophthalmology.
Key insights
OCTCube-M integrates 3D OCT with other retinal imaging modalities for enhanced diagnostic and prognostic capabilities.
Principles
- 3D volumetric data improves diagnostic accuracy.
- Multi-modal integration enhances model generalizability.
- Contrastive learning is effective for multi-modal data fusion.
Method
OCTCube-M employs COEP, a multi-modal contrastive learning method, to integrate 3D OCT with 2D en face, IR, and fundus autofluorescence images, creating uni-modal, bi-modal, and tri-modal foundation models.
In practice
- Pre-trained OCTCube model checkpoint is publicly available on Hugging Face Hub.
- Code and library list for OCTCube are available on GitHub.
- Access to OCTCube-EF for non-commercial research is available upon request.
Topics
- Optical Coherence Tomography
- Multi-modal Foundation Models
- Retinal Disease Diagnosis
- Geographic Atrophy Prognosis
- Contrastive Learning
Code references
Best for: Computer Vision Engineer, AI Scientist, Machine Learning Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine learning : nature.com subject feeds.