From Data Statistics to Feature Geometry: How Correlations Shape Superposition
Summary
A new study introduces Bag-of-Words Superposition (BOWS), a controlled environment for encoding binary bag-of-words representations of internet text, to investigate how feature correlations influence superposition in neural networks. Contrary to prior research that primarily focused on sparse, uncorrelated features and viewed superposition interference as noise to be minimized, this work demonstrates that correlations can lead to constructive interference. The BOWS framework reveals that features arrange themselves based on co-activation patterns, allowing active features to interfere constructively while still utilizing ReLUs to prevent false positives. This arrangement, particularly prevalent in models trained with weight decay, naturally forms semantic clusters and cyclical structures, offering an explanation for patterns observed in real language models that were not accounted for by the traditional understanding of superposition.
Key takeaway
For research scientists investigating neural network interpretability, this work suggests that your understanding of superposition should account for feature correlations. Recognizing that interference can be constructive, not just noise, changes how you might analyze feature representations and design regularization strategies. Consider exploring how weight decay influences feature geometry in your models to potentially uncover semantic clusters and cyclical structures.
Key insights
Feature correlations can enable constructive interference in neural network superposition, forming semantic clusters.
Principles
- Correlated features arrange by co-activation patterns.
- Weight decay promotes constructive superposition arrangements.
Method
Bag-of-Words Superposition (BOWS) encodes binary text representations to study feature geometry under correlation, using ReLUs to manage false positives.
In practice
- Explore feature geometry in language models.
- Apply weight decay to enhance semantic clustering.
Topics
- Mechanistic Interpretability
- Superposition
- Feature Geometry
- Bag-of-Words Superposition
- Neural Networks
Code references
Best for: Research Scientist, AI Researcher, AI Scientist, Deep Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.