Agents
ScholarQuest: A Taxonomy-Guided Benchmark for Agentic Academic Paper Search in Open Literature Environments
ScholarQuest is a newly released large-scale benchmark designed for evaluating agentic academic paper search in open literature environments, constructed from over 1,000 computer science topics and four research intents. It features a scalable answer construction system and a shared retrieval backend, ScholarBase, for reproducible evaluations. Benchmark results indicate that while agentic methods outperform traditional retrieval baselines, there remains significant room for improvement, with the best-performing agent achieving only 0.314 Recall@100 and 0.355 Recall@All, emphasizing the need for further advancements in this area for practitioners.
academicsearchllmbenchmark