Toy Combinatorial Interpretability Models Reveal Lottery Tickets in Early Feature Space
Summary
Alon Bebchuk and Nir Shavit's research, detailed in paper 2605.17704, investigates the mechanistic underpinnings of the lottery ticket hypothesis within a combinatorial, clause-structured toy setting. This setting allows for an interpretable feature-space representation with defined combinatorial distances between features. The study reveals that "winning tickets" in weight space correspond to precursor locations in feature space that are already close to the final feature-channel codes at initialization. Dense Stochastic Gradient Descent (SGD) resolves these locations via structured selection, where proximal locations either converge or are rejected, with rejection occurring more frequently at crowded neurons due to competition. The authors define a winning ticket as a family of compatible code locations that balance proximity to final codes with low inter-feature interference. Sparse retraining often re-expresses the same clause/template family on a different row, indicating that the preserved object is at the family level, not microscopic row identity. Lightweight probes based on feature-space distance and motion frequently outperform established weight-based ticket discovery methods in accuracy and exact code recovery within this toy setting.
Key takeaway
For research scientists investigating neural network pruning and interpretability, understanding that winning tickets are governed by hidden feature-space geometry rather than just weight-space subnetwork identity is crucial. You should explore feature-space-based probing methods, as they can offer superior accuracy and code recovery compared to traditional weight-based approaches, potentially leading to more robust and interpretable sparse models.
Key insights
Winning tickets preserve feature-space geometry, not just weight-space subnetwork identity.
Principles
- Winning tickets align with initial feature-space proximity.
- SGD resolves features through structured selection and competition.
- Preserved objects are family-level, not microscopic.
Method
The study uses lightweight probes based on feature-space distance and motion to identify winning tickets, outperforming weight-based methods in a combinatorial toy setting.
In practice
- Explore feature-space geometry for ticket discovery.
- Consider family-level preservation in sparse retraining.
Topics
- Lottery Ticket Hypothesis
- Feature Space Geometry
- Combinatorial Interpretability Models
- Sparse Subnetworks
- Weight Space
Best for: Research Scientist, AI Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.