Inference
Early Diagnosis of Wasted Computation in Multi-Agent LLM Systems via Failure-Aware Observability
The article presents a framework for diagnosing wasted computation in multi-agent LLM systems, specifically within a three-agent architecture consisting of an orchestrator, a search agent, and an execution agent. The proposed system utilizes trace-based failure-aware observability to generate online signals regarding computation loops and information gain, complemented by offline semantic grounding metrics. The findings demonstrate that 58.1% of tokens are expended after initial warnings in failed runs, and the implementation of warning-based interventions in a pilot study reduced unnecessary token usage from 63.8% to 30.4%, highlighting the framework's potential to optimize resource allocation in AI systems.
observabilitymulti-agentllm