A foundation model of vision, audition, and language for in-silico neuroscience | Research - AI at Meta

2026-03-26 · Source: ai.meta.com via Google News · Field: Science & Research — Life Sciences & Biology, Artificial Intelligence & Machine Learning · Depth: Expert, medium

Summary

TRIBE v2, a tri-modal foundation model, has been introduced to unify cognitive neuroscience by predicting human brain activity across various naturalistic and experimental conditions. This model integrates video, audio, and language inputs and was trained on a dataset comprising over 1,000 hours of fMRI data from 720 subjects. TRIBE v2 demonstrates significant improvements in accuracy over traditional linear encoding models, predicting high-resolution brain responses for novel stimuli, tasks, and subjects. The model also facilitates in-silico experimentation, successfully replicating findings from decades of empirical research in visual and neuro-linguistic paradigms. Furthermore, by extracting interpretable latent features, TRIBE v2 elucidates the fine-grained topography of multisensory integration, positioning artificial intelligence as a cohesive framework for understanding human brain function.

Key takeaway

For AI Scientists and Research Scientists focused on computational neuroscience, TRIBE v2 offers a powerful tool for exploring brain function. You should consider integrating such tri-modal foundation models into your research to move beyond fragmented, specialized models. This approach can accelerate the discovery of unified cognitive principles and enable efficient in-silico validation of hypotheses, potentially reducing the need for extensive empirical studies.

Key insights

TRIBE v2 unifies cognitive neuroscience by predicting human brain activity across diverse conditions using a tri-modal foundation model.

Principles

Unified models surpass specialized ones.
AI can replicate empirical neuroscience findings.

Method

TRIBE v2 predicts brain activity by processing video, audio, and language inputs, trained on 1,000+ hours of fMRI data from 720 subjects, and extracts interpretable latent features for multisensory integration.

In practice

Use TRIBE v2 for in-silico neuroscience experiments.
Apply tri-modal models for brain activity prediction.

Topics

TRIBE v2
Foundation Models
In-silico Neuroscience
Brain Activity Prediction
Multisensory Integration

Best for: AI Scientist, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by ai.meta.com via Google News.