Predictable Confabulations: Factual Recall by LLMs Scales with Model Size and Topic Frequency
Summary
A new study evaluated 38 large language models (LLMs) on over 8,900 scholarly references using an automated verification system to establish a scaling law for factual recall. The research found that recall quality follows a sigmoid function based on the log-linear combination of model parameter count and the topic's representation frequency within the training data. These two variables collectively explain 60% of the variance across 16 dense models from four distinct families, with explanatory power increasing to 74-94% within individual model families. This observed relationship aligns with a superposition-inspired model where factual recall is gated by a signal-to-noise ratio, with signal strength scaling with concept frequency and the noise floor scaling with model capacity.
Key takeaway
For research scientists optimizing LLM performance, understanding that factual recall is a predictable function of model size and training data topic frequency is crucial. You should prioritize increasing relevant topic representation in training datasets and scaling model parameters to enhance factual accuracy. This insight helps in designing more efficient training strategies and predicting model capabilities for knowledge-intensive tasks.
Key insights
Factual recall in LLMs scales predictably with model size and topic frequency in training data.
Principles
- Recall quality follows a sigmoid function.
- Signal strength scales with concept frequency.
Method
Evaluated 38 LLMs on >8,900 scholarly references using an automated verification system to link factual recall to model size and training data composition.
In practice
- Prioritize training data topic frequency.
- Scale model parameters for better recall.
Topics
- Large Language Models
- Factual Recall
- Scaling Laws
- Model Size
- Training Data Composition
Best for: Research Scientist, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.