Multi-Hop Knowledge Composition is Bound by Pretraining Exposure

2026-06-08 · Source: Takara TLDR - Daily AI Papers · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Expert, quick

Summary

Large Language Models exhibit a persistent failure in implicit multi-hop reasoning, even when individual facts are perfectly memorized and retrievable. This phenomenon, where models correctly answer "When was X born?" and "Who is Y's closest friend?" but fail on "When was Y's closest friend born?" in a single pass, was studied in a controlled natural language setting. Researchers confirmed that this compositional failure persists even at 97% 1-hop accuracy, establishing it as a pretraining deficiency rather than a knowledge absence. After testing nine data-centric augmentation formats, the study found that compositional pretraining transfers to unseen questions for individuals exposed to such contexts during pretraining. However, this transferability does not extend to individuals absent from compositional pretraining, suggesting that exposure to compositional contexts during pretraining is a necessary condition for implicit multi-hop reasoning capabilities.

Key takeaway

For NLP Engineers developing or fine-tuning Large Language Models, you should recognize that implicit multi-hop reasoning capabilities are directly tied to pretraining exposure. If your model struggles with composite questions despite knowing individual facts, consider augmenting your pretraining datasets with diverse compositional contexts, especially for critical entities. This approach is essential to ensure your models can generalize multi-hop reasoning beyond explicitly seen examples.

Key insights

LLMs' implicit multi-hop reasoning failure is due to insufficient compositional pretraining exposure, not a knowledge gap.

Principles

Compositional failure persists despite high 1-hop accuracy.
Pretraining exposure to compositional contexts is crucial.
Transferability is limited to exposed individuals.

Method

The study used a controlled natural language setting, separating individuals by compositional pretraining exposure. Nine data-centric augmentation formats were tested to evaluate transferability.

In practice

Prioritize compositional context in pretraining data.
Ensure diverse compositional examples for all entities.

Topics

Large Language Models
Multi-hop Reasoning
Pretraining
Data Augmentation
Knowledge Composition

Best for: Research Scientist, AI Scientist, NLP Engineer, Machine Learning Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.