Identifying Interactions at Scale for LLMs

2026-03-13 · Source: The Berkeley Artificial Intelligence Research Blog · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Explainable AI · Depth: Advanced, medium

Summary

The SPEX and ProxySPEX frameworks are novel algorithms designed to identify influential interactions within complex machine learning systems, including Large Language Models (LLMs), at scales previously unachievable. These methods address the challenge of "complexity at scale" in interpretability research by efficiently capturing how features, training data points, and internal model components interact to drive predictions. SPEX, or Spectral Explainer, leverages signal processing and coding theory, based on observations of sparsity and low-degreeness in influential interactions, to perform sparse recovery. ProxySPEX further enhances efficiency by exploiting the hierarchical nature of interactions, reducing the number of required ablations by approximately 10x while matching SPEX's performance. The frameworks apply to feature attribution, data attribution, and mechanistic interpretability, enabling analysis of long-context inputs, identification of data synergies and redundancies, and understanding of internal model component dependencies.

Key takeaway

For research scientists developing or deploying complex ML models, understanding the underlying decision-making processes is crucial for trustworthiness and safety. You should consider integrating SPEX or ProxySPEX into your interpretability toolkit, especially when dealing with high-dimensional data or large models. These frameworks offer a scalable way to uncover critical feature, data, and model component interactions, enabling more targeted debugging, architectural interventions, and data curation strategies.

Key insights

SPEX and ProxySPEX efficiently identify influential interactions in ML models by exploiting sparsity and hierarchy.

Principles

Model behavior emerges from complex dependencies.
Influential interactions are sparse and low-degree.
Higher-order interactions often imply lower-order subsets.

Method

Ablation measures influence by observing changes when components are removed. SPEX uses signal processing for sparse recovery, while ProxySPEX adds hierarchical insights for 10x fewer ablations.

In practice

Identify symptoms driving LLM medical diagnoses.
Prune attention heads for task-specific performance.
Select training data by preserving synergies.

Topics

Machine Learning Interpretability
Large Language Models
Feature Attribution
Data Attribution
Mechanistic Interpretability

Code references

mmschlk/shapiq

Best for: Research Scientist, AI Researcher, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by The Berkeley Artificial Intelligence Research Blog.