ILDR: Geometric Early Detection of Grokking

2026-04-24 · Source: cs.LG updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

A new geometric metric, Inter/Intra-class Distance Ratio (ILDR), has been proposed for the early detection of "grokking," a phenomenon where neural networks achieve perfect training accuracy long before validation accuracy improves. Unlike existing methods like weight norm or GrokFast's slow gradient EMA, ILDR is computed on second-to-last layer representations, measuring the ratio of inter-class centroid separation to intra-class scatter. This metric provides an early signal, rising and crossing a threshold at 2.5 times its baseline before the grokking transition is observed in validation accuracy. Evaluated on modular arithmetic and permutation group composition (S5) tasks, ILDR consistently leads the grokking transition by 9 to 73 percent of the training budget, with lead times increasing with task complexity. Across eight random seeds, ILDR leads by 950 +/- 250 steps, reducing training by an average of 18.6 percent when used as an early stopping trigger.

Key takeaway

For research scientists optimizing neural network training, integrating ILDR can significantly improve efficiency. By detecting the onset of grokking up to 73 percent earlier, you can implement early stopping or targeted optimizer interventions, potentially reducing computational costs by nearly 20 percent. Consider implementing ILDR to gain bidirectional control over the grokking transition and enhance generalization.

Key insights

ILDR provides an early, robust geometric signal for grokking, enabling earlier intervention and reduced training.

Principles

Grokking involves early geometric reorganization.
Inter-class separation indicates generalization.
Early stopping can reduce training costs.

Method

Compute ILDR as the ratio of inter-class centroid separation to intra-class scatter on second-to-last layer representations. Trigger intervention when ILDR crosses 2.5x its baseline.

In practice

Apply ILDR for early stopping in neural network training.
Use ILDR to trigger optimizer interventions.
Monitor representation space for generalization cues.

Topics

Grokking
Inter/Intra-class Distance Ratio
Geometric Metrics
Neural Network Generalization
Early Detection

Best for: Research Scientist, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.LG updates on arXiv.org.