SafetyarXiv cs.AI — 7 d ago

When Errors Become Narratives: A Longitudinal Taxonomy of Silent Failures in a Production LLM Agent Runtime

This study presents a longitudinal analysis of silent failures in a production-level personal-assistant LLM agent system, operational since March 2026, which encompasses 40 scheduled jobs and integrates multiple LLM providers. The researchers documented 22 incidents over eight weeks, identifying a unique failure type termed "fail-plausible," where the LLM generates misleading narratives instead of reporting errors, highlighting the need for improved error visibility in LLM systems. The findings emphasize that traditional testing and audits are insufficient for preventing such failures, advocating for a defense framework that ensures failures are detectable and accountable, ultimately guiding the design of more robust agent systems.

llmfailurestaxonomyrelevance 0.00 · engagement 0.00

Read at source ↗← all news