Learning to Select, Not Relearn: Hard-Routed Mixtures of Reasoning LoRAs

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

The \textbf{Hard-Routed MoR-LoRA} framework, published on 2026-06-30, introduces a two-stage method for composing independently trained LoRA adapters into a single large language model, addressing multi-domain adaptation challenges where original training data is unavailable. Unlike common MoE-style soft routing, which can disrupt the unit-scale additive updates of frozen LoRA modules, this approach employs unit-scale hard selection. Initially, domain-specific LoRA adapters are trained as reasoning experts using reinforcement learning from verifiable feedback. Subsequently, these experts are frozen, their reasoning traces distilled, and only a lightweight shared router along with a small attention LoRA is trained for integration. The router utilizes hard top-1 selection for one expert per token, enabled by a straight-through estimator for gradient-based training. Experiments across five benchmarks, multiple model scales, and various model families demonstrate that \textbf{Hard-Routed MoR-LoRA} effectively preserves expert behavior while significantly reducing trainable parameters compared to soft-routing mixture baselines. Analysis suggests soft mixtures often concentrate routing on a single expert, validating the efficiency of hard unit-scale routing for frozen LoRA expert composition.

Key takeaway

For Machine Learning Engineers composing LoRA adapters for multi-domain large language models, you should consider Hard-Routed MoR-LoRA. This framework provides a parameter-efficient method for integrating frozen reasoning experts through hard selection, preserving their original behavior better than soft-routing mixtures. Implementing this two-stage approach can significantly reduce trainable parameters while maintaining performance across diverse domains, especially when original training data cannot be shared.

Key insights

Hard-Routed MoR-LoRA efficiently composes frozen LoRA experts using hard selection, preserving behavior with fewer parameters.

Principles

Method

Train domain-specific LoRA experts via RL. Freeze experts, distill traces. Train lightweight router and attention LoRA for hard top-1 selection per token using a straight-through estimator.

In practice

Topics

Best for: Research Scientist, AI Engineer, NLP Engineer, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.