RAG
When Iterative RAG Beats Ideal Evidence: A Diagnostic Study in Scientific Multi-hop Question Answering
The study introduces a controlled diagnostic framework comparing Iterative Retrieval-Augmented Generation (RAG) against static RAG in multi-hop scientific question answering, utilizing the ChemKGMultiHopQA dataset. It benchmarks eleven state-of-the-art LLMs across three contexts: No Context, Gold Context, and Iterative RAG, revealing that Iterative RAG can outperform Gold Context by up to 25.6 percentage points, particularly benefiting models not fine-tuned for reasoning. This work underscores the importance of staged retrieval in enhancing model performance and provides insights for practitioners on optimizing RAG systems in scientific applications.
ragmulti-hopquestion answering