Learning Sparse Latent Predictive Foundation Model for Multimodal Neuroimaging
Summary
Neuro-JEPA is a sparse multimodal neuroimaging foundation model designed to learn unified representations across various MRI contrast mechanisms. It combines a latent predictive objective with a Mixture-of-Experts architecture to encode T1w, T2w, and FLAIR imaging. Pretrained on 1,551,862 scans from 428,647 studies, Neuro-JEPA underwent systematic methodological study for architectural, masking, objective, and sparsity design choices. Evaluated across 25 tasks from three health systems (NYU Langone, NYU Long Island, Massachusetts General Hospital) and 22 tasks from 12 public datasets, Neuro-JEPA consistently outperformed a simple convolutional neural network baseline and existing neuroimaging foundation models. This establishes a scalable framework for multimodal neuroimaging representation learning.
Key takeaway
For AI Scientists and Machine Learning Engineers developing neuroimaging solutions, consider Neuro-JEPA's architectural approach for robust multimodal representation learning. Its consistent performance across diverse clinical and public datasets suggests that incorporating latent predictive objectives and Mixture-of-Experts architectures can yield superior results. You should also prioritize evaluation protocols that include simple baselines and clinically heterogeneous cohorts to validate foundation models effectively.
Key insights
Neuro-JEPA is a sparse multimodal neuroimaging foundation model using a latent predictive objective and Mixture-of-Experts.
Principles
- Systematic study of architectural, masking, objective, and sparsity choices.
- Foundation model evaluation needs simple baselines and heterogeneous cohorts.
Method
Neuro-JEPA was pretrained on 1,551,862 scans from 428,647 studies, encoding T1w, T2w, and FLAIR imaging after modality-specific preprocessing.
In practice
- Encode brain MRI across T1w, T2w, and FLAIR sequences.
- Learn unified representations at health-system scale.
Topics
- Neuroimaging
- Foundation Models
- Multimodal Learning
- Mixture-of-Experts
- Latent Predictive Models
- MRI
Best for: Computer Vision Engineer, AI Scientist, Machine Learning Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.