Toy Combinatorial Interpretability Models Reveal Lottery Tickets in Early Feature Space

2026-05-18 · Source: Takara TLDR - Daily AI Papers · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

Alon Bebchuk and Nir Shavit's research, detailed in paper 2605.17704, investigates the mechanistic underpinnings of the lottery ticket hypothesis within a combinatorial, clause-structured toy setting. This setting allows for an interpretable feature-space representation with defined combinatorial distances between features. The study reveals that "winning tickets" in weight space correspond to precursor locations in feature space that are already close to the final feature-channel codes at initialization. Dense Stochastic Gradient Descent (SGD) resolves these locations via structured selection, where proximal locations either converge or are rejected, with rejection occurring more frequently at crowded neurons due to competition. The authors define a winning ticket as a family of compatible code locations that balance proximity to final codes with low inter-feature interference. Sparse retraining often re-expresses the same clause/template family on a different row, indicating that the preserved object is at the family level, not microscopic row identity. Lightweight probes based on feature-space distance and motion frequently outperform established weight-based ticket discovery methods in accuracy and exact code recovery within this toy setting.

Key takeaway

For research scientists investigating neural network pruning and interpretability, understanding that winning tickets are governed by hidden feature-space geometry rather than just weight-space subnetwork identity is crucial. You should explore feature-space-based probing methods, as they can offer superior accuracy and code recovery compared to traditional weight-based approaches, potentially leading to more robust and interpretable sparse models.

Key insights

Winning tickets preserve feature-space geometry, not just weight-space subnetwork identity.

Principles

Winning tickets align with initial feature-space proximity.
SGD resolves features through structured selection and competition.
Preserved objects are family-level, not microscopic.

Method

The study uses lightweight probes based on feature-space distance and motion to identify winning tickets, outperforming weight-based methods in a combinatorial toy setting.

In practice

Explore feature-space geometry for ticket discovery.
Consider family-level preservation in sparse retraining.

Topics

Lottery Ticket Hypothesis
Feature Space Geometry
Combinatorial Interpretability Models
Sparse Subnetworks
Weight Space

Best for: Research Scientist, AI Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.