A three-dimensional multi-modal foundation model for optical coherence tomography

2026-04-24 · Source: Machine learning : nature.com subject feeds · Field: Health & Wellbeing — Medical Devices & Health Technology, Clinical Care & Medical Practice, Medical Specialties & Subspecialties · Depth: Expert, extended

Summary

OCTCube-M is a novel 3D multi-modal foundation model framework designed for integrated analysis of 3D Optical Coherence Tomography (OCT) and 2D en face (EF) retinal images, addressing limitations in existing computational models for retinal disease diagnosis. The framework utilizes COEP, a multi-modal contrastive learning method, to integrate OCT with other imaging modalities like fundus autofluorescence and infrared retinal imaging (IR). Three models were developed: OCTCube (uni-modal), OCTCube-IR (bi-modal), and OCTCube-EF (tri-modal). OCTCube, pre-trained on 26,605 3D OCT volumes (1.62 million 2D slices), achieved state-of-the-art performance in predicting 8 retinal diseases and demonstrated robust generalizability. OCTCube-IR, incorporating 26,685 pairs of OCT and IR images, enabled accurate cross-modality retrieval. OCTCube-EF, trained on over 4 million 2D OCT slices and 400 thousand EF images, excels in predicting geographic atrophy growth rates across 6 multi-center clinical trials in 23 countries.

Key takeaway

For computer vision engineers developing diagnostic tools for retinal diseases, OCTCube-M offers a robust framework for integrating diverse imaging modalities. You should consider leveraging its 3D foundation model architecture and multi-modal contrastive learning approach to improve diagnostic accuracy and prognostic capabilities, particularly for conditions like geographic atrophy. Explore the publicly available OCTCube model and code to accelerate your development of generalizable AI solutions in ophthalmology.

Key insights

OCTCube-M integrates 3D OCT with other retinal imaging modalities for enhanced diagnostic and prognostic capabilities.

Principles

3D volumetric data improves diagnostic accuracy.
Multi-modal integration enhances model generalizability.
Contrastive learning is effective for multi-modal data fusion.

Method

OCTCube-M employs COEP, a multi-modal contrastive learning method, to integrate 3D OCT with 2D en face, IR, and fundus autofluorescence images, creating uni-modal, bi-modal, and tri-modal foundation models.

In practice

Pre-trained OCTCube model checkpoint is publicly available on Hugging Face Hub.
Code and library list for OCTCube are available on GitHub.
Access to OCTCube-EF for non-commercial research is available upon request.

Topics

Optical Coherence Tomography
Multi-modal Foundation Models
Retinal Disease Diagnosis
Geographic Atrophy Prognosis
Contrastive Learning

Code references

Best for: Computer Vision Engineer, AI Scientist, Machine Learning Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine learning : nature.com subject feeds.