OPRD: On-Policy Representation Distillation

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

On-Policy Representation Distillation (OPRD) is a novel method designed to improve knowledge distillation for large language models by addressing limitations of traditional On-Policy Distillation (OPD). OPD typically supervises student models only in output space, leading to persistent sampling variance from Monte Carlo KL estimates over large vocabularies, such as Qwen's ~150k tokens, and treats the teacher model as a black-box. OPRD overcomes this by aligning student and teacher hidden-state representations across selected layers on the same rollouts, completely bypassing the language model head. This approach theoretically eliminates sampling variance and provides richer per-layer structural information. Empirically, OPRD successfully closes the student-teacher performance gap on AIME 2024/2025 and AIMO benchmarks, where output-space OPD baselines plateau. Furthermore, OPRD trains 1.44x faster and uses 54% less memory than top-k OPD.

Key takeaway

For Machine Learning Engineers optimizing large language models, OPRD offers a compelling alternative to traditional on-policy distillation. If you are struggling with sampling variance or high memory usage during distillation, consider implementing OPRD's hidden-state alignment. This approach can significantly close the student-teacher performance gap on complex reasoning benchmarks like AIME and AIMO, while also accelerating training by 1.44x and reducing memory consumption by 54%.

Key insights

OPRD improves on-policy distillation by aligning hidden states, eliminating sampling variance and providing richer structural information.

Principles

Method

OPRD aligns student and teacher hidden-state representations across selected layers on the same rollouts, entirely bypassing the language model head to eliminate sampling variance.

In practice

Topics

Code references

Best for: Research Scientist, AI Engineer, NLP Engineer, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.