PLOT: Progressive Localization via Optimal Transport in Neural Causal Abstraction
Summary
PLOT (Progressive Localization via Optimal Transport) is a new framework designed to localize causal variables within neural networks, addressing the computational burden of existing methods like Distributed Alignment Search (DAS). DAS learns expressive subspace interventions but requires extensive searching for relevant neural sites. PLOT, in contrast, uses optimal transport to fit a coupling between abstract variables and candidate neural sites, creating a global soft correspondence that can be calibrated into intervention handles. For larger models, PLOT operates progressively, moving from coarse sites like tokens or layers to finer supports such as coordinate groups or PCA spans. This method can also guide DAS, significantly reducing its runtime. Experiments show that transport-only PLOT handles are fast and accurate, while PLOT-guided DAS achieves DAS-level accuracy with substantially less computational cost.
Key takeaway
For research scientists working on mechanistic interpretability and causal abstraction in neural networks, PLOT offers a significant efficiency improvement. You should consider integrating PLOT into your workflow to rapidly localize causal variables, especially when dealing with large models or when aiming to reduce the computational cost of methods like DAS, thereby accelerating your research and experimentation cycles.
Key insights
PLOT localizes neural causal variables efficiently using optimal transport, reducing computational overhead for mechanistic interpretability.
Principles
- Optimal transport can map abstract variables to neural sites.
- Progressive localization refines site identification from coarse to fine.
- Guiding search with localization improves efficiency.
Method
PLOT fits an optimal transport coupling between abstract variables and candidate neural sites, generating a global soft correspondence. This correspondence is then calibrated into intervention handles, progressively refining localization from coarse to fine granularities.
In practice
- Use PLOT to identify relevant neural sites faster.
- Apply progressive localization for large models.
- Integrate PLOT to accelerate DAS workflows.
Topics
- Neural Causal Abstraction
- Optimal Transport
- Mechanistic Interpretability
- Progressive Localization
- Distributed Alignment Search
Best for: Research Scientist, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.