Learning to Select, Not Relearn: Hard-Routed Mixtures of Reasoning LoRAs
Summary
The \textbf{Hard-Routed MoR-LoRA} framework, published on 2026-06-30, introduces a two-stage method for composing independently trained LoRA adapters into a single large language model, addressing multi-domain adaptation challenges where original training data is unavailable. Unlike common MoE-style soft routing, which can disrupt the unit-scale additive updates of frozen LoRA modules, this approach employs unit-scale hard selection. Initially, domain-specific LoRA adapters are trained as reasoning experts using reinforcement learning from verifiable feedback. Subsequently, these experts are frozen, their reasoning traces distilled, and only a lightweight shared router along with a small attention LoRA is trained for integration. The router utilizes hard top-1 selection for one expert per token, enabled by a straight-through estimator for gradient-based training. Experiments across five benchmarks, multiple model scales, and various model families demonstrate that \textbf{Hard-Routed MoR-LoRA} effectively preserves expert behavior while significantly reducing trainable parameters compared to soft-routing mixture baselines. Analysis suggests soft mixtures often concentrate routing on a single expert, validating the efficiency of hard unit-scale routing for frozen LoRA expert composition.
Key takeaway
For Machine Learning Engineers composing LoRA adapters for multi-domain large language models, you should consider Hard-Routed MoR-LoRA. This framework provides a parameter-efficient method for integrating frozen reasoning experts through hard selection, preserving their original behavior better than soft-routing mixtures. Implementing this two-stage approach can significantly reduce trainable parameters while maintaining performance across diverse domains, especially when original training data cannot be shared.
Key insights
Hard-Routed MoR-LoRA efficiently composes frozen LoRA experts using hard selection, preserving behavior with fewer parameters.
Principles
- Unit-scale hard selection maintains LoRA update integrity.
- Distilling reasoning traces enables lightweight router training.
- Hard top-1 routing can be more efficient than soft mixtures.
Method
Train domain-specific LoRA experts via RL. Freeze experts, distill traces. Train lightweight router and attention LoRA for hard top-1 selection per token using a straight-through estimator.
In practice
- Integrate diverse LoRA experts without retraining.
- Reduce trainable parameters for multi-domain LLMs.
- Apply hard routing for frozen adapter composition.
Topics
- LoRA Adapters
- Mixture-of-Experts
- Hard Routing
- Reinforcement Learning
- LLM Adaptation
- Parameter Efficiency
Best for: Research Scientist, AI Engineer, NLP Engineer, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.