LoRi: Low-Rank Distillation for Implicit Reasoning

2026-06-06 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, extended

Summary

LoRi (Low-Rank Distillation for Implicit Reasoning) is a new framework addressing the underperformance of implicit chain-of-thought (iCoT) methods compared to explicit CoT prompting in large language models (LLMs). Researchers empirically found that hidden-state reasoning trajectories exhibit a low-rank structure. LoRi leverages this by aligning teacher and student reasoning trajectories in a shared low-rank tensor subspace using first- and second-order statistics. This approach captures global reasoning structure while enabling a compact latent process. Evaluated across LLaMA and Qwen model families (0.5B to 8B parameters) on mathematical reasoning benchmarks like GSM8K-Hard, LoRi consistently improves performance by up to ~12% accuracy, significantly narrowing the gap to explicit CoT and outperforming prior iCoT distillation methods. It also offers substantial training efficiency gains and 5.1x to 6.9x faster inference latency.

Key takeaway

For machine learning engineers optimizing LLM inference, LoRi offers a compelling method to achieve near-explicit Chain-of-Thought accuracy with significantly reduced computational overhead. You can distill complex reasoning into compact latent processes, improving performance on multi-step tasks like GSM8K-Hard by up to ~10% while benefiting from 5.1x to 6.9x faster inference. Consider implementing LoRi to enhance your models' reasoning capabilities and efficiency without sacrificing accuracy.

Key insights

Reasoning trajectories in LLMs exhibit low-rank structure, enabling efficient distillation into compact latent processes.

Principles

Hidden-state reasoning trajectories have low-rank structure.
Global reasoning geometry can be transferred efficiently.
Low-rank alignment supports length-invariant distillation.

Method

LoRi distills reasoning by aligning teacher and student hidden-state trajectories in a shared low-rank subspace using first- and second-order statistics, combining rationale-level and anchor-level alignment.

In practice

Precompute teacher low-rank factors for efficient training.
Use 5 latent steps for optimal reasoning compression.
Distill with as few as 128 training samples.

Topics

Low-Rank Distillation
Implicit Chain-of-Thought
LLM Reasoning
Model Efficiency
Mathematical Benchmarks
LLaMA, Qwen

Code references

rmsolgi/lori

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.