Clever Hans Has a Brain Scanner
Summary
An experimental setup addresses the "Clever Hans" problem in NeuroML, questioning whether high-performing brain-encoding models truly understand brain activity or exploit shortcuts. The author developed a closed image-to-brain-to-image loop using Meta's TribeV2 encoder, the Natural Scenes Dataset (7T fMRI), a CLIP embedding decoder, and a Kandinsky 2.2 diffusion renderer. This system compares image reconstructions from real fMRI with those from TribeV2's predicted fMRI. Initially, real fMRI yielded recognizable scenes (e.g., living room, CLIP similarity 0.30), while predicted fMRI resulted in garbled images (CLIP 0.15). However, after training the decoder on TribeV2's synthetic fMRI, the predicted arm produced cleaner reconstructions, sometimes outperforming the real-brain arm (e.g., cafeteria, CLIP 0.56 vs 0.33). This apparent superiority is attributed to the predicted fMRI being noise-free and deterministic, allowing the model to "win on an easier problem" rather than demonstrating genuine brain understanding.
Key takeaway
For AI Scientists and Research Scientists developing brain-encoding models, you must move beyond correlation-based prediction scores as the sole metric of success. Your models might be exploiting noise-free synthetic data, leading to a "Clever Hans" scenario where apparent understanding is merely shortcut learning. Implement mechanistic interpretation, counterfactuals, and causal interventions, such as "trick questions" or component removal, to genuinely assess if your models understand brain function for the right reasons.
Key insights
Brain-encoding models' high prediction scores may reflect shortcut learning, not true brain understanding, due to noise-free synthetic data.
Principles
- High prediction scores don't guarantee understanding.
- Noise-free synthetic data can create false positives.
- Mechanistic interpretation is crucial for NeuroML.
Method
A closed image-to-brain-to-image loop compares reconstructions from real fMRI and predicted fMRI, using a fixed decoder and anchoring results between scrambled and real brain controls.
In practice
- Test models with "trick questions" to separate meaning from surface features.
- Intervene in models to remove shortcut components.
- Use causal interventions beyond correlation-based analysis.
Topics
- Brain-encoding Models
- NeuroML
- Shortcut Learning
- fMRI Prediction
- Mechanistic Interpretation
- Image Reconstruction
Best for: AI Scientist, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning on Medium.