LoRi: Low-Rank Distillation for Implicit Reasoning
Summary
LoRi (Low-Rank Distillation for Implicit Reasoning) is a new framework addressing the underperformance of implicit chain-of-thought (iCoT) methods compared to explicit CoT prompting in large language models (LLMs). Researchers empirically found that hidden-state reasoning trajectories exhibit a low-rank structure. LoRi leverages this by aligning teacher and student reasoning trajectories in a shared low-rank tensor subspace using first- and second-order statistics. This approach captures global reasoning structure while enabling a compact latent process. Evaluated across LLaMA and Qwen model families (0.5B to 8B parameters) on mathematical reasoning benchmarks like GSM8K-Hard, LoRi consistently improves performance by up to ~12% accuracy, significantly narrowing the gap to explicit CoT and outperforming prior iCoT distillation methods. It also offers substantial training efficiency gains and 5.1x to 6.9x faster inference latency.
Key takeaway
For machine learning engineers optimizing LLM inference, LoRi offers a compelling method to achieve near-explicit Chain-of-Thought accuracy with significantly reduced computational overhead. You can distill complex reasoning into compact latent processes, improving performance on multi-step tasks like GSM8K-Hard by up to ~10% while benefiting from 5.1x to 6.9x faster inference. Consider implementing LoRi to enhance your models' reasoning capabilities and efficiency without sacrificing accuracy.
Key insights
Reasoning trajectories in LLMs exhibit low-rank structure, enabling efficient distillation into compact latent processes.
Principles
- Hidden-state reasoning trajectories have low-rank structure.
- Global reasoning geometry can be transferred efficiently.
- Low-rank alignment supports length-invariant distillation.
Method
LoRi distills reasoning by aligning teacher and student hidden-state trajectories in a shared low-rank subspace using first- and second-order statistics, combining rationale-level and anchor-level alignment.
In practice
- Precompute teacher low-rank factors for efficient training.
- Use 5 latent steps for optimal reasoning compression.
- Distill with as few as 128 training samples.
Topics
- Low-Rank Distillation
- Implicit Chain-of-Thought
- LLM Reasoning
- Model Efficiency
- Mathematical Benchmarks
- LLaMA, Qwen
Code references
Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.