Escaping the KL Agreement Trap in On-Policy Distillation
Summary
On-policy distillation (OPD) often encounters a "low-KL agreement trap" where a teacher model provides weak token-level supervision. This occurs when a student model drifts into an unrecoverable state, and the teacher locally agrees with the degraded output, resulting in low reverse KL divergence but little corrective signal. Researchers identify that tokens generated during and after such traps yield less useful supervision. To address this, KAT (KL Agreement Trap Termination) is proposed as an online OPD termination rule. KAT dynamically detects persistent "low-KL agreement" using a training-adaptive threshold, filtering out weak supervision. This method improves "avg@k" accuracy by 2.66% and "pass@k" by 3.43% across four mathematical benchmarks, while significantly reducing average rollout length by 59.73%.
Key takeaway
For Machine Learning Engineers optimizing on-policy distillation, implementing KAT (KL Agreement Trap Termination) is crucial. You can significantly improve model performance by avoiding the "low-KL agreement trap," which otherwise provides weak supervision. Adopting KAT will boost your "avg@k" accuracy by 2.66% and "pass@k" by 3.43% on mathematical tasks, while also reducing computational costs by cutting average rollout length by nearly 60%.
Key insights
Detecting and terminating low-KL agreement traps in on-policy distillation improves training efficiency and accuracy.
Principles
- Teacher agreement with degraded student states creates a low-KL trap.
- Weak supervision from degenerate agreement hinders OPD effectiveness.
- Dynamic thresholds can detect and filter unhelpful training signals.
Method
KAT is an online on-policy distillation termination rule that detects persistent low-KL agreement using a dynamic, training-adaptive threshold to filter weak supervision.
In practice
- Apply KAT to improve "avg@k" accuracy by 2.66%.
- Reduce average rollout length by 59.73% in OPD.
- Enhance "pass@k" by 3.43% on mathematical benchmarks.
Topics
- On-policy Distillation
- KL Divergence
- Reinforcement Learning
- Language Models
- Supervised Learning
- Mathematical Benchmarks
- KAT Algorithm
Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.