AgentsarXiv cs.CL — 11 d ago

Beyond Monolingual Deep Research: Evaluating Agents and Retrievers with Cross-Lingual BrowseComp-Plus

The paper introduces XBCP (Cross-lingual BrowseComp-Plus), a benchmark designed to evaluate deep research agents' performance when querying evidence in multiple languages, contrasting with existing monolingual benchmarks. It features two settings: a cross-lingual setting with single-language evidence and a multilingual setting with documents distributed across 12 languages. Evaluation of four deep research agents revealed significant declines in accuracy and evidence recall when handling translated evidence, indicating critical challenges in retrieval and integration of cross-lingual information for AI practitioners.

benchmarkcross-lingualretrievalrelevance 0.00 · engagement 0.00

Read at source ↗← all news