Coarse-to-fine Hierarchical Architecture with Sequential Mamba for Brain Reconstruction
Summary
CHASMBrain is a novel hierarchical two-stage framework designed for image-to-fMRI encoding, aiming to understand the relationship between visual representations and the human visual system. Its architecture employs a dual-stream Mamba design, explicitly separating global semantic tokens and local spatial patches, mirroring the visual cortex's functional organization. A coarse-to-fine strategy is used, where Stage 1 predicts denoised ROI-level activations, and Stage 2 refines these into full voxel-level predictions via a Mamba-VAE. On the Natural Scenes Dataset (NSD), CHASMBrain achieved a Pearson correlation of 0.429 and an MSE of 0.261, surpassing all evaluated baselines, including ridge regression and DINOv2 linear probes. Causal ablation experiments revealed an asymmetric specialization: the patch stream targets early visual cortex, while the CLS stream contributes semantic context to higher-order areas. The learned backbone also generalizes across individuals with minimal per-subject adaptation.
Key takeaway
For computational neuroscientists developing image-to-fMRI encoding models, CHASMBrain offers a robust architectural blueprint. Its dual-stream Mamba design, separating global semantic and local spatial processing, combined with a coarse-to-fine prediction strategy, significantly improves fMRI activation prediction. You should consider integrating similar hierarchical Mamba-based approaches to enhance model accuracy and interpretability, especially when aiming for cross-subject generalization in brain reconstruction tasks. This method provides a strong foundation for future research into visual system modeling.
Key insights
CHASMBrain's dual-stream Mamba and coarse-to-fine fMRI encoding architecture achieves high predictive accuracy and reveals visual cortex functional specialization.
Principles
- Dual-stream processing (semantic/spatial) mirrors visual cortex organization.
- Coarse-to-fine refinement improves fMRI prediction accuracy.
- Model backbones can capture subject-agnostic visual representations.
Method
CHASMBrain employs a two-stage process: first, a dual-stream Mamba predicts denoised ROI-level fMRI activations. Second, a Mamba-VAE refines these into full voxel-level predictions.
In practice
- Use dual-stream Mamba for hierarchical visual processing.
- Implement coarse-to-fine prediction for fMRI reconstruction.
- Test model generalization across diverse subject data.
Topics
- Brain Reconstruction
- fMRI Encoding
- Mamba Architecture
- Computational Neuroscience
- Hierarchical Models
- Natural Scenes Dataset
Code references
Best for: AI Scientist, Research Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.