Agents
IHBench: Evaluating Post-Interruption Recovery in Voice Agents with Structured Workflows
IHBench (Interruption Handling Benchmark) has been introduced to evaluate the post-interruption recovery capabilities of voice agents in structured workflows across 10 enterprise domains. The benchmark tests 27 audio-language model configurations from OpenAI, Google, and open-weight models, focusing on six types of interruptions and assessing task fulfillment and recovery quality. Results indicate that closed-weight models outperform open-weight models in handling interruptions, degrading at a slower rate as conversation length increases, highlighting the importance of interruption recovery for practitioners developing robust voice agents.
voice agentsbenchmarkinterruption handling