Interpreting Brain Responses to Language with Sparse Features from Language Models
Summary
Augmented Sparse Encoding Models (ASEMs) offer a new framework for interpreting human brain responses to language by replacing dense language model (LM) hidden states with hierarchically-organized sparse autoencoder (SAE) features, explicitly including surprisal as a predictor. This approach, applied to a high-field 7T fMRI dataset from eight participants listening to 200 linguistically diverse sentences, validates previous interpretations of voxel populations tuned to processing difficulty and meaning abstractness. The study further identifies a previously uncharacterized voxel population tuned to people-related content. It demonstrates that the fronto-temporal human language network is predicted by a common set of features, though frontal regions are well-explained by surprisal alone. Crucially, brain responses are best explained by LM features capturing general information, indicating a non-trivial correspondence between brain and LM language representation.
Key takeaway
For AI Scientists and Research Scientists developing or evaluating language models against neural data, this work suggests that focusing on sparse, interpretable features from LMs, rather than dense hidden states, can yield more meaningful insights into brain-model alignment. You should consider incorporating surprisal as a distinct predictor, especially when analyzing frontal brain regions, to better understand the specific contributions of different linguistic features to neural responses. This approach offers a clearer path to interpreting the "black box" problem in cognitive neuroscience.
Key insights
Sparse autoencoder features and surprisal can interpret human language cortex responses to language.
Principles
- Brain responses align with general information encoded in LMs.
- Fronto-temporal language network uses common features across regions.
- Frontal brain regions are significantly predicted by surprisal alone.
Method
Augmented Sparse Encoding Models (ASEMs) replace dense LM hidden states with hierarchically-organized sparse autoencoder (SAE) features, adding surprisal as a predictor for fMRI data analysis.
In practice
- Use SAE features for interpretable LM-brain alignment studies.
- Consider surprisal as a primary predictor for frontal brain activity.
Topics
- Language Models
- Cognitive Neuroscience
- fMRI Data Analysis
- Sparse Autoencoders
- Neural Decoding
- Brain-LM Alignment
Best for: AI Scientist, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.