Explaining Attention with Program Synthesis
Summary
An approach is proposed for approximating deep network components with executable programs, specifically targeting attention heads in transformer language models. The method involves computing attention matrices from training examples, then prompting a pre-trained language model to generate Python programs that reproduce these attention patterns given input text. These programs are subsequently re-ranked based on their predictive performance on held-out inputs. This technique demonstrates that fewer than 1,000 generated programs can accurately reproduce attention patterns in models like GPT-2, TinyLlama-1.1B, and Llama-3B, achieving an average Intersection-over-Union similarity exceeding 75% on TinyStories. Furthermore, replacing 25% of attention heads with these programmatic surrogates across the three models results in only a 16% average perplexity increase, while maintaining performance on various downstream question answering benchmarks, advancing symbolic transparency.
Key takeaway
For AI scientists focused on transformer interpretability, this research offers a scalable pipeline to reverse-engineer attention heads into human-readable Python programs. You can achieve symbolic transparency by replacing up to 25% of neural attention heads with these programmatic surrogates, incurring minimal perplexity increase (16%) while preserving downstream task performance. Consider applying this program synthesis method to demystify complex model behaviors and enhance explainability.
Key insights
Program synthesis can create human-readable code to explain and replace transformer attention heads.
Principles
- Neural computations can be approximated symbolically.
- Executable programs can mimic attention patterns.
- Symbolic surrogates can maintain model performance.
Method
The approach computes attention matrices, prompts an LLM to generate Python programs reproducing patterns, then re-ranks programs by held-out input prediction.
In practice
- Reverse-engineer transformer attention heads.
- Replace neural components with symbolic code.
- Improve neural model transparency.
Topics
- Program Synthesis
- Transformer Models
- Attention Mechanisms
- Explainable AI
- Model Interpretability
- Symbolic AI
Best for: AI Scientist, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.