Agents
Scaling Enterprise Agent Routing: Degradation, Diagnosis, and Recovery
The study evaluates the routing accuracy of production LLM assistants as the catalog of specialized tools expands, focusing on a 110-agent, 584-tool setup. It finds that routing F1 scores drop by 16-23 percentage points due to retrieval and confusion gaps, with embedding-based shortlisting techniques improving F1 scores by 10-11 percentage points across all models tested. This research highlights the importance of effective routing strategies in maintaining performance as tool catalogs scale, providing actionable insights for practitioners in optimizing LLM deployment.
routingLLMenterprise