Learning to Route LLMs from Implicit Cost-Performance Preferences via Meta-Learning

2025-07-11 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, extended

Summary

MetaRouter is a novel meta-learning framework designed for perceptive Large Language Model (LLM) routing, addressing the inherent trade-off between model performance and computational cost. Existing routing solutions struggle with diverse user cost-performance preferences, often requiring manual configuration or costly retraining. MetaRouter overcomes this by efficiently learning implicit user preferences through minimal interaction, framing distinct preference profiles as contextual bandit tasks. The framework uses a context encoder to infer latent preference representations from pairwise user feedback and a policy network to route queries. Meta-training, incorporating noise injection and entropy regularization, enables rapid adaptation. Experimental results demonstrate MetaRouter's superior performance over strong baselines on both in-distribution and out-of-distribution tasks, including hybrid QA, code generation, and mathematical reasoning. It achieves high efficiency, needing only about 6 contexts for strong performance, and exhibits robustness to changes in routable LLMs and scalability to multi-model scenarios involving up to five LLMs.

Key takeaway

For AI Engineers building LLM applications with varied user needs, you should consider implementing preference-aware routing systems. MetaRouter demonstrates that meta-learning can efficiently adapt to implicit user cost-performance trade-offs with minimal interaction, requiring only about six feedback contexts. This approach allows your systems to dynamically optimize LLM selection, reducing costs while maintaining performance without constant manual configuration or retraining for each user profile.

Key insights

Meta-learning enables LLM routers to adapt to diverse user cost-performance preferences with minimal interaction.

Principles

LLM routing involves a performance-cost trade-off.
User preferences for LLM routing are heterogeneous.
Implicit preferences can be learned from minimal feedback.

Method

MetaRouter uses a context encoder for latent preference inference from pairwise comparisons and a policy network for query routing, jointly trained via meta-learning with entropy regularization and noise injection.

In practice

Use pairwise comparisons for preference feedback.
Employ meta-learning for rapid adaptation to user needs.
Incorporate noise injection for robust model generalization.

Topics

LLM Routing
Meta-Learning
Contextual Bandits
Cost-Performance Optimization
User Preference Learning
Hybrid LLM Systems

Code references

Best for: Research Scientist, AI Architect, NLP Engineer, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.