Predictable Confabulations: Factual Recall by LLMs Scales with Model Size and Topic Frequency

· Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

A new study evaluated 38 large language models (LLMs) on over 8,900 scholarly references using an automated verification system to establish a scaling law for factual recall. The research found that recall quality follows a sigmoid function based on the log-linear combination of model parameter count and the topic's representation frequency within the training data. These two variables collectively explain 60% of the variance across 16 dense models from four distinct families, with explanatory power increasing to 74-94% within individual model families. This observed relationship aligns with a superposition-inspired model where factual recall is gated by a signal-to-noise ratio, with signal strength scaling with concept frequency and the noise floor scaling with model capacity.

Key takeaway

For research scientists optimizing LLM performance, understanding that factual recall is a predictable function of model size and training data topic frequency is crucial. You should prioritize increasing relevant topic representation in training datasets and scaling model parameters to enhance factual accuracy. This insight helps in designing more efficient training strategies and predicting model capabilities for knowledge-intensive tasks.

Key insights

Factual recall in LLMs scales predictably with model size and topic frequency in training data.

Principles

Method

Evaluated 38 LLMs on >8,900 scholarly references using an automated verification system to link factual recall to model size and training data composition.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.