Coarse-to-fine Hierarchical Architecture with Sequential Mamba for Brain Reconstruction

· Source: Takara TLDR - Daily AI Papers · Field: Science & Research — Life Sciences & Biology, Mathematics & Computational Sciences, Research Methodology & Innovation · Depth: Expert, medium

Summary

CHASMBrain is a novel hierarchical two-stage framework designed for image-to-fMRI encoding, aiming to understand the relationship between visual representations and the human visual system. Its architecture employs a dual-stream Mamba design, explicitly separating global semantic tokens and local spatial patches, mirroring the visual cortex's functional organization. A coarse-to-fine strategy is used, where Stage 1 predicts denoised ROI-level activations, and Stage 2 refines these into full voxel-level predictions via a Mamba-VAE. On the Natural Scenes Dataset (NSD), CHASMBrain achieved a Pearson correlation of 0.429 and an MSE of 0.261, surpassing all evaluated baselines, including ridge regression and DINOv2 linear probes. Causal ablation experiments revealed an asymmetric specialization: the patch stream targets early visual cortex, while the CLS stream contributes semantic context to higher-order areas. The learned backbone also generalizes across individuals with minimal per-subject adaptation.

Key takeaway

For computational neuroscientists developing image-to-fMRI encoding models, CHASMBrain offers a robust architectural blueprint. Its dual-stream Mamba design, separating global semantic and local spatial processing, combined with a coarse-to-fine prediction strategy, significantly improves fMRI activation prediction. You should consider integrating similar hierarchical Mamba-based approaches to enhance model accuracy and interpretability, especially when aiming for cross-subject generalization in brain reconstruction tasks. This method provides a strong foundation for future research into visual system modeling.

Key insights

CHASMBrain's dual-stream Mamba and coarse-to-fine fMRI encoding architecture achieves high predictive accuracy and reveals visual cortex functional specialization.

Principles

Method

CHASMBrain employs a two-stage process: first, a dual-stream Mamba predicts denoised ROI-level fMRI activations. Second, a Mamba-VAE refines these into full voxel-level predictions.

In practice

Topics

Code references

Best for: AI Scientist, Research Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.