On-Policy Context Distillation for Language Models
Summary
On-Policy Context Distillation (OPCD) is a new framework designed to help language models internalize in-context knowledge directly into their parameters. OPCD achieves this by training a student model on its own generated trajectories, while simultaneously minimizing the reverse Kullback-Leibler divergence against a teacher model conditioned on specific contexts. The framework has been successfully applied to two key areas: experiential knowledge distillation, where models consolidate knowledge from past solution traces, and system prompt distillation, allowing models to internalize behaviors from optimized prompts. Across diverse tasks such as mathematical reasoning, text-based games, and domain-specific applications, OPCD consistently surpasses baseline methods in task accuracy and maintains out-of-distribution capabilities. It also facilitates effective cross-size distillation, enabling smaller student models to learn from larger teachers.
Key takeaway
For AI Engineers developing more efficient and capable language models, OPCD offers a robust method to internalize contextual knowledge and prompt behaviors directly into model parameters. You should consider integrating OPCD to improve task accuracy and preserve out-of-distribution performance, especially when distilling knowledge from larger models to smaller, more deployable student models.
Key insights
OPCD enables language models to internalize in-context knowledge by training on self-generated trajectories against a context-conditioned teacher.
Principles
- Distill knowledge from historical solution traces.
- Internalize behaviors from optimized system prompts.
Method
Train a student model on its own generated trajectories, minimizing reverse Kullback-Leibler divergence against a context-conditioned teacher model.
In practice
- Apply to mathematical reasoning tasks.
- Use for text-based game agents.
- Enable cross-size model distillation.
Topics
- On-Policy Context Distillation
- Language Models
- Knowledge Distillation
- Experiential Knowledge
- System Prompt Distillation
Best for: AI Engineer, Machine Learning Engineer, Research Scientist, AI Researcher, AI Scientist, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.