Beyond Output Matching: Preserving Internal Geometry in NVFP4 LLM Distillatio
Summary
A new method, CKA-QAD, is proposed to enhance low-precision large language model (LLM) distillation, specifically for NVFP4-based approaches. Traditional quantization-aware distillation (QAD) often relies solely on output matching, which can mask internal representational degradation, particularly in reinforcement learning (RL)-post-trained models. This internal drift, diagnosed using CKA, correlates with reduced performance on reasoning and coding tasks. CKA-QAD addresses this by adding a lightweight regularizer that preserves internal representational geometry during distillation, aligning layerwise Gram matrices through CKA. This approach substantially improves representational alignment and boosts downstream reasoning and coding accuracy for models such as Nemotron 3 Nano and Qwen3-4B-Thinking-2507, incurring only modest training overhead.
Key takeaway
For Machine Learning Engineers deploying low-precision LLMs, particularly with NVFP4 quantization, you should integrate CKA-guided representational alignment into your distillation workflows. Relying solely on output matching risks internal model degradation, impacting reasoning and coding performance. By adopting CKA-QAD, you can significantly improve representational alignment and downstream accuracy, ensuring robust performance with modest training overhead. Consider this method to enhance the reliability of your quantized LLM deployments.
Key insights
Output-matching alone in LLM distillation can hide internal degradation; preserving internal geometry is crucial for accuracy.
Principles
- Output matching can mask internal LLM degradation.
- Internal representational geometry impacts reasoning.
- CKA measures layerwise representational similarity.
Method
CKA-QAD adds a lightweight regularizer to QAD, aligning layerwise Gram matrices via CKA to preserve internal representational geometry during distillation.
In practice
- Apply CKA-QAD for NVFP4 LLM accuracy recovery.
- Use CKA to diagnose internal representational drift.
- Improve reasoning and coding task performance.
Topics
- LLM Quantization
- NVFP4 Inference
- Knowledge Distillation
- CKA-QAD
- Representational Geometry
- Model Accuracy
Best for: Research Scientist, NLP Engineer, AI Scientist, Machine Learning Engineer, AI Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.