Language Models Learn Universal Representations of Numbers and Here's Why You Should Care
Summary
This research investigates how Large Language Models (LLMs) process and represent numerical information, addressing the conflict between their accurate internal number embeddings and their propensity for numerical output errors. The study finds that diverse LLMs, including OLMo 2, Llama 3, and Phi 4, converge to systematic, highly accurate, and universal sinusoidal representations of numbers across their hidden states and input contexts. These representations are consistent across layers, primarily maintained by residual streams, though input/output embeddings are more distributed than the sparser internal representations. The authors developed universal sinusoidal probes that can accurately extract numeric information and attribute up to 94% of arithmetic reasoning errors in models like Llama 3.2 3B to specific internal layers, particularly in division. The work also shows that multi-token numbers are systematically superposed, with high accuracy for up to three tokens (up to 10^9).
Key takeaway
Research Scientists developing or fine-tuning LLMs should focus on the internal sinusoidal representations of numbers. Understanding these universal representations allows for the creation of more accurate probing techniques, which can pinpoint specific layers responsible for numerical errors. This insight enables targeted architectural adjustments to improve arithmetic reasoning and overall numerical accuracy, especially for multi-token numbers, potentially reducing errors by 27-64% in operations like division.
Key insights
LLMs use universal, systematic sinusoidal representations for numbers, enabling precise error tracing to specific internal layers.
Principles
- Number representations are sinusoidal and universal across LLMs.
- Residual streams maintain consistency of numeric representations.
- Natural-language probes generalize better than synthetic probes.
Method
The study uses Representational Similarity Analysis (RSA) and Fourier decompositions to quantify embedding similarity. It employs a sinusoidal probe, defined as $f_{\sin}(\mathbf{x}) =(\mathbf{W}_{\mathrm{out}}\mathbf{S})^{T}(\mathbf{W}_{\mathrm{in}}\mathbf{x})$, to decode internal representations and track error origins.
In practice
- Train probes using natural-language contexts for better generalization.
- Identify specific layers causing arithmetic errors in LLMs.
- Consider architectural refinements based on error-prone layers.
Topics
- Language Models
- Numeric Representations
- Sinusoidal Probes
- Model Interpretability
- Error Tracking
Code references
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.NE updates on arXiv.org.