Inside the Visual Mind: Neuroscience-Motivated Concept Circuits for Interpreting and Steering Vision Transformers

2026-06-08 · Source: cs.CV updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision · Depth: Expert, extended

Summary

ViSAE is a neuroscience-motivated mechanistic interpretability toolbox designed to understand and steer Vision Transformers (ViTs) by decomposing their internal representations into human-interpretable concept circuits. It addresses limitations in existing Sparse Autoencoder (SAE) methods, such as poor concept coverage and subjective interpretation. ViSAE features a probing suite with 64K images and a 16K visually grounded concept vocabulary, which improves concept coverage efficiency by 20x over ImageNet and interpretation accuracy by 28.7%. The toolbox includes top-down concept reading and bottom-up circuit tracing algorithms to automatically recover ViT inner workings. Its applications include auditing decision-making processes and steering model behavior, demonstrated by improving worst-group accuracy on WaterBirds by 48.2%, outperforming existing methods by 23.8%.

Key takeaway

For AI Engineers deploying Vision Transformers, understanding internal decision processes is crucial for ensuring safety and robustness. ViSAE offers a robust framework to diagnose spurious correlations and precisely steer model behavior by editing specific concepts. You can use its concept circuits to trace information flow and improve worst-group accuracy, enhancing trust in your deployed models and mitigating risks from opaque AI systems.

Key insights

ViSAE uses neuroscience-inspired concept circuits to interpret and steer Vision Transformers, improving transparency and control.

Principles

Hierarchical visual processing aids concept organization.
Automated concept mapping reduces interpretation subjectivity.
Causal tracing reveals concept interactions across layers.

Method

ViSAE trains Sparse Autoencoders (SAEs) with a 64K image/16K concept probing suite. It then uses CLIP for top-down concept reading and counterfactual interventions for bottom-up circuit tracing.

In practice

Audit ViT decision pathways for transparency.
Localize abstract concepts directly on pixels.
Steer model behavior via concept editing.

Topics

Vision Transformers
Mechanistic Interpretability
Sparse Autoencoders
Concept Circuits
Model Auditing
Model Steering

Code references

deep-real/ViSAE

Best for: Computer Vision Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CV updates on arXiv.org.