Agents
EvoBrowseComp: Benchmarking Search Agents on Evolving Knowledge
EvoBrowseComp is a newly introduced benchmark designed for evaluating search agents, specifically large language models enhanced with search capabilities. It features 800 complex questions (400 in English and 400 in Chinese) generated through a three-agent collaborative framework that ensures contamination-free and temporally relevant queries by synthesizing data from live web traversal. This benchmark addresses the limitations of static evaluation methods by providing a scalable, continuously updated testing environment that better assesses the reasoning and retrieval abilities of AI models, making it crucial for practitioners aiming to develop robust search agents.
search-agentsbenchmarkevolving-knowledge