RAGarXiv cs.AI — 21 h ago

LakeQA: An Exploratory QA Benchmark over a Million-Scale Data Lake

LakeQA is a newly introduced benchmark for search-centric question answering over large data lakes, comprising approximately 9.5 TB of text from diverse sources like Wikipedia and open-source government data. It requires long-horizon multi-hop reasoning and implicit intermediate steps, with each task annotated by Ph.D.-level experts to ensure quality. Experimental results indicate that even advanced models, such as GPT-5.2, struggle with this benchmark, achieving only an 18.37% exact-match score, highlighting the need for improved LLM capabilities in both search and reasoning for practical applications.

question answeringsearchdata lakerelevance 0.00 · engagement 0.00

Read at source ↗← all news