Beyond Output Matching: Preserving Internal Geometry in NVFP4 LLM Distillatio

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, extended

Summary

A new method, CKA-QAD, addresses internal representational degradation in NVFP4 large language model (LLM) distillation. While standard Quantization-Aware Distillation (QAD) recovers output accuracy, it often reduces layerwise representational similarity, particularly in RL-post-trained models, leading to bottlenecks in reasoning and coding tasks. CKA-QAD augments the QAD objective with a Centered Kernel Alignment (CKA) regularizer, which explicitly preserves the geometric structure of intermediate activations. Experiments on Nemotron 3 Nano and Qwen3-4B-Thinking-2507 demonstrate that CKA-QAD significantly improves representational alignment, raising average CKA from 0.958 to 0.994 on Nemotron 3 Nano and from 0.98 to 0.99 on Qwen3-4B-Thinking-2507. This method also enhances downstream reasoning and coding accuracy on benchmarks like AIME25, GPQA-D, and LiveCodeBench-v5, with only a 0.5% step time and 7.0% peak VRAM overhead.

Key takeaway

For AI Engineers deploying NVFP4 LLMs, relying solely on output-matching QAD risks internal representational degradation, impacting reasoning and coding tasks. You should integrate CKA-guided representational alignment into your distillation pipeline. This approach preserves critical internal geometry, improving accuracy on complex benchmarks like AIME25 and LiveCodeBench-v5, with minimal training overhead. Consider CKA-QAD to ensure robust low-bit LLM performance.

Key insights

Output-matching QAD can degrade internal LLM representations; CKA-guided alignment preserves geometry for better low-bit accuracy.

Principles

Method

CKA-QAD augments standard QAD by adding a layerwise CKA regularizer to align intermediate activations' Gram matrices. It uses top-k logit distillation and dynamically balances the CKA term with the KL loss.

In practice

Topics

Best for: Research Scientist, NLP Engineer, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.