Multi-Hop Knowledge Composition is Bound by Pretraining Exposure
Summary
Large Language Models exhibit a persistent failure in implicit multi-hop reasoning, even when individual facts are perfectly memorized and retrievable. This phenomenon, where models correctly answer "When was X born?" and "Who is Y's closest friend?" but fail on "When was Y's closest friend born?" in a single pass, was studied in a controlled natural language setting. Researchers confirmed that this compositional failure persists even at 97% 1-hop accuracy, establishing it as a pretraining deficiency rather than a knowledge absence. After testing nine data-centric augmentation formats, the study found that compositional pretraining transfers to unseen questions for individuals exposed to such contexts during pretraining. However, this transferability does not extend to individuals absent from compositional pretraining, suggesting that exposure to compositional contexts during pretraining is a necessary condition for implicit multi-hop reasoning capabilities.
Key takeaway
For NLP Engineers developing or fine-tuning Large Language Models, you should recognize that implicit multi-hop reasoning capabilities are directly tied to pretraining exposure. If your model struggles with composite questions despite knowing individual facts, consider augmenting your pretraining datasets with diverse compositional contexts, especially for critical entities. This approach is essential to ensure your models can generalize multi-hop reasoning beyond explicitly seen examples.
Key insights
LLMs' implicit multi-hop reasoning failure is due to insufficient compositional pretraining exposure, not a knowledge gap.
Principles
- Compositional failure persists despite high 1-hop accuracy.
- Pretraining exposure to compositional contexts is crucial.
- Transferability is limited to exposed individuals.
Method
The study used a controlled natural language setting, separating individuals by compositional pretraining exposure. Nine data-centric augmentation formats were tested to evaluate transferability.
In practice
- Prioritize compositional context in pretraining data.
- Ensure diverse compositional examples for all entities.
Topics
- Large Language Models
- Multi-hop Reasoning
- Pretraining
- Data Augmentation
- Knowledge Composition
Best for: Research Scientist, AI Scientist, NLP Engineer, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.