AgentsarXiv cs.CL — 7 d ago

EvoBrowseComp: Benchmarking Search Agents on Evolving Knowledge

EvoBrowseComp is a newly introduced benchmark designed for evaluating search agents, specifically large language models enhanced with search capabilities. It features 800 complex questions (400 in English and 400 in Chinese) generated through a three-agent collaborative framework that ensures contamination-free and temporally relevant queries by synthesizing data from live web traversal. This benchmark addresses the limitations of static evaluation methods by providing a scalable, continuously updated testing environment that better assesses the reasoning and retrieval abilities of AI models, making it crucial for practitioners aiming to develop robust search agents.

search-agentsbenchmarkevolving-knowledgerelevance 0.00 · engagement 0.00

Read at source ↗← all news