The Discrete-Log Clock: How a Transformer Learns Modular Multiplication

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Mathematics & Computational Sciences · Depth: Expert, quick

Summary

A new analysis of small transformers learning modular multiplication reveals that prior observations of a "dense" Fourier spectrum were an artifact of using an inappropriate analytical basis. Researchers demonstrate that when analyzing a transformer trained on a · b mod 113, applying the multiplicative character transform—the natural Fourier transform for the multiplicative group (ℤ/pℤ)*—yields a highly sparse embedding spectrum. This contrasts sharply with the standard additive DFT, showing a Gini coefficient of 0.58 compared to 0.07. Only 4 key frequencies carry significant energy, and 96.9% of MLP neurons are precisely tuned to a single multiplicative frequency. Neuron activation heatmaps further exhibit 2D-periodic structure when reordered by the discrete logarithm. These findings indicate the transformer implements a "Discrete-Log Clock" algorithm, effectively reducing multiplication to addition in discrete-log space, mirroring the Clock algorithm for addition.

Key takeaway

For AI Scientists investigating how models learn arithmetic, this work suggests re-evaluating analysis techniques. You should consider aligning your Fourier transform basis with the algebraic structure of the task, such as using the multiplicative character transform for modular multiplication. This approach can reveal sparse, interpretable representations, like the "Discrete-Log Clock" algorithm, where standard methods might only show noise. Applying this methodology could uncover hidden algorithmic strategies in your own transformer models.

Key insights

The transformer learns modular multiplication by reducing it to addition in discrete-log space, revealed by using the correct multiplicative character transform.

Principles

Method

Analyze transformer embeddings for modular multiplication using the multiplicative character transform on (ℤ/pℤ)* to reveal sparse spectral structure and discrete-log space operations.

In practice

Topics

Best for: AI Scientist, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.