Clever Hans Has a Brain Scanner

2026-06-22 · Source: Machine Learning on Medium · Field: Science & Research — Artificial Intelligence & Machine Learning, Health & Medical Research, Research Methodology & Innovation · Depth: Advanced, medium

Summary

An experimental setup addresses the "Clever Hans" problem in NeuroML, questioning whether high-performing brain-encoding models truly understand brain activity or exploit shortcuts. The author developed a closed image-to-brain-to-image loop using Meta's TribeV2 encoder, the Natural Scenes Dataset (7T fMRI), a CLIP embedding decoder, and a Kandinsky 2.2 diffusion renderer. This system compares image reconstructions from real fMRI with those from TribeV2's predicted fMRI. Initially, real fMRI yielded recognizable scenes (e.g., living room, CLIP similarity 0.30), while predicted fMRI resulted in garbled images (CLIP 0.15). However, after training the decoder on TribeV2's synthetic fMRI, the predicted arm produced cleaner reconstructions, sometimes outperforming the real-brain arm (e.g., cafeteria, CLIP 0.56 vs 0.33). This apparent superiority is attributed to the predicted fMRI being noise-free and deterministic, allowing the model to "win on an easier problem" rather than demonstrating genuine brain understanding.

Key takeaway

For AI Scientists and Research Scientists developing brain-encoding models, you must move beyond correlation-based prediction scores as the sole metric of success. Your models might be exploiting noise-free synthetic data, leading to a "Clever Hans" scenario where apparent understanding is merely shortcut learning. Implement mechanistic interpretation, counterfactuals, and causal interventions, such as "trick questions" or component removal, to genuinely assess if your models understand brain function for the right reasons.

Key insights

Brain-encoding models' high prediction scores may reflect shortcut learning, not true brain understanding, due to noise-free synthetic data.

Principles

High prediction scores don't guarantee understanding.
Noise-free synthetic data can create false positives.
Mechanistic interpretation is crucial for NeuroML.

Method

A closed image-to-brain-to-image loop compares reconstructions from real fMRI and predicted fMRI, using a fixed decoder and anchoring results between scrambled and real brain controls.

In practice

Test models with "trick questions" to separate meaning from surface features.
Intervene in models to remove shortcut components.
Use causal interventions beyond correlation-based analysis.

Topics

Brain-encoding Models
NeuroML
Shortcut Learning
fMRI Prediction
Mechanistic Interpretation
Image Reconstruction

Best for: AI Scientist, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning on Medium.