Agents
EChO-Agent: Evidence Chain Orchestration Agent for Audio Reasoning
EChO-Agent is a new modular agent framework designed to enhance audio question answering (QA) by reformulating the task into a structured workflow that includes planning, tool execution, evidence integration, and answer verification. It addresses the limitations of large audio language models (LALMs) in focusing on relevant audio segments and providing a clear reasoning process. Experiments on the MMAR benchmark demonstrate that EChO-Agent significantly improves accuracy and rubric scores, with evidence integration identified as a critical factor for its performance, making it a valuable tool for practitioners working with audio reasoning tasks in AI.
audioqaagents