Improving Parametric Knowledge Access in Reasoning Language Models
Summary
Research investigates how reasoning language models access world knowledge stored in their parameters, noting that default reasoning often underperforms. A simple "think step-by-step" prompt significantly improves knowledge recall, but not mathematical reasoning. To address this, a new training approach is proposed where models are reinforced to reason over their parametric knowledge using world-knowledge question answering as a verifiable reward signal. This method, applied to TriviaQA, yielded a 9.9% performance increase. Subsequent evaluations showed improved performance on Natural Questions (+4.2%), HotpotQA (+2.1%), SimpleQA (+0.6%), and StrategyQA (+3.0%), demonstrating that models can be effectively trained to enhance parametric knowledge access.
Key takeaway
For NLP engineers developing knowledge-intensive applications, you should consider that current reasoning models are under-optimized for parametric knowledge access. Implementing simple step-by-step prompting can immediately improve recall, and further fine-tuning with reinforcement learning on world-knowledge QA datasets offers substantial performance gains for your models.
Key insights
Reasoning language models can be trained to better access their internal world knowledge.
Principles
- Default reasoning underperforms for knowledge recall.
- Step-by-step prompting aids knowledge access.
Method
Reinforcement learning on world-knowledge QA tasks provides verifiable rewards to train models for improved parametric knowledge access, enhancing recall across diverse datasets.
In practice
- Use "think step-by-step" for knowledge recall.
- Fine-tune models on QA for knowledge access.
Topics
- Parametric Knowledge Access
- Reasoning Language Models
- Reinforcement Learning
- Question Answering
- World Knowledge
Best for: NLP Engineer, Research Scientist, AI Researcher, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.