The Discrete-Log Clock: How a Transformer Learns Modular Multiplication
Summary
A new analysis of small transformers learning modular multiplication reveals that prior observations of a "dense" Fourier spectrum were an artifact of using an inappropriate analytical basis. Researchers demonstrate that when analyzing a transformer trained on a · b mod 113, applying the multiplicative character transform—the natural Fourier transform for the multiplicative group (ℤ/pℤ)*—yields a highly sparse embedding spectrum. This contrasts sharply with the standard additive DFT, showing a Gini coefficient of 0.58 compared to 0.07. Only 4 key frequencies carry significant energy, and 96.9% of MLP neurons are precisely tuned to a single multiplicative frequency. Neuron activation heatmaps further exhibit 2D-periodic structure when reordered by the discrete logarithm. These findings indicate the transformer implements a "Discrete-Log Clock" algorithm, effectively reducing multiplication to addition in discrete-log space, mirroring the Clock algorithm for addition.
Key takeaway
For AI Scientists investigating how models learn arithmetic, this work suggests re-evaluating analysis techniques. You should consider aligning your Fourier transform basis with the algebraic structure of the task, such as using the multiplicative character transform for modular multiplication. This approach can reveal sparse, interpretable representations, like the "Discrete-Log Clock" algorithm, where standard methods might only show noise. Applying this methodology could uncover hidden algorithmic strategies in your own transformer models.
Key insights
The transformer learns modular multiplication by reducing it to addition in discrete-log space, revealed by using the correct multiplicative character transform.
Principles
- Matching analysis basis to algebraic structure reveals interpretability.
- Transformers can implement complex arithmetic via simpler algebraic transformations.
- Sparse spectral representations indicate underlying algorithmic simplicity.
Method
Analyze transformer embeddings for modular multiplication using the multiplicative character transform on (ℤ/pℤ)* to reveal sparse spectral structure and discrete-log space operations.
In practice
- Apply multiplicative character transform for modular arithmetic tasks.
- Reorder neuron activations by discrete logarithm for structural insights.
- Investigate algebraic basis for other complex learned functions.
Topics
- Transformers
- Modular Multiplication
- Multiplicative Character Transform
- Discrete Logarithm
- Algebraic Structure
- Interpretability
Best for: AI Scientist, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.