RAG
How Fine-Grained Should a RAG Benchmark Be? A Hierarchical Framework for Synthetic Question Generation
The article introduces HieraRAG, a hierarchical framework for optimizing the granularity of benchmarks in evaluating retrieval-augmented generation (RAG) systems. It provides empirical guidance on question characteristics by generating 5,872 synthetic question-answer pairs across three dimensions and varying granularity levels, demonstrating that optimal granularity differs by dimension, with complexity benefiting from fine distinctions. This framework and its Coherence Ratio metric enable practitioners to tailor evaluation granularity in their RAG configurations, enhancing the discriminative power of benchmark assessments.
benchmarkragevaluation