Scaling Enterprise Agent Routing: Degradation, Diagnosis, and Recovery

2026-06-16 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Advanced, quick

Summary

A study on scaling enterprise agent routing for production LLM assistants reveals significant accuracy degradation as tool catalogs grow. Researchers evaluated three frontier models on a 110-agent, 584-tool catalog from a deployed enterprise productivity assistant, finding that routing F1 on under-specified requests dropped 16-23 percentage points across models when scaling from 10 to 110 agents. An oracle analysis attributed this degradation to both a retrieval gap and a confusion gap, with the oracle ceiling dropping 10 percentage points. Implementing embedding-based shortlisting successfully recovered +10-11 percentage points in F1 score at full scale across all models and providers. A subsequent production annotation study, involving 1,435 human-labeled utterances, further confirmed a +10-17 percentage point recovery on real traffic, albeit with 10-15 percentage points lower absolute performance.

Key takeaway

For AI Engineers scaling LLM-powered enterprise assistants, recognize that routing accuracy significantly degrades with increasing agent and tool catalog size. You should implement embedding-based shortlisting to recover substantial F1 performance, as demonstrated by +10-17 percentage point gains on real traffic. Proactively diagnose routing failures into retrieval and confusion gaps to target your optimization efforts effectively.

Key insights

Scaling LLM agent routing degrades accuracy due to retrieval and confusion, but embedding-based shortlisting significantly recovers performance.

Principles

Routing accuracy degrades with scale.
Degradation stems from retrieval and confusion.
Embedding shortlisting boosts routing F1.

Method

Embedding-based shortlisting is applied to recover routing F1. This involves using embeddings to pre-filter or rank potential tools before the LLM makes a final routing decision, mitigating retrieval and confusion gaps.

In practice

Implement embedding shortlisting for large agent catalogs.
Decompose routing errors into retrieval and confusion.
Evaluate F1 on under-specified routing requests.

Topics

LLM Agent Routing
Enterprise AI Assistants
Embedding Shortlisting
Routing Accuracy
Tool Catalogs
Performance Degradation

Best for: AI Architect, NLP Engineer, AI Scientist, AI Engineer, Machine Learning Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.