Lost at the End: Primacy Bias in Multimodal Retrieval-Augmented Question Answering
The article presents a study on the "Lost at the End" phenomenon in multimodal knowledge-based visual question answering (KB-VQA), highlighting a shift from a U-shaped information retrieval pattern to a primacy bias where earlier retrieved passages significantly outperform later ones. The research involved three open-source vision-language models (VLMs) with 7B/8B parameters and two KB-VQA benchmarks, demonstrating a 16 to 26 point advantage for gold passages presented first. The authors propose that recall@k is inadequate for measuring performance in deployed KB-VQA systems and introduce a gold-position protocol for evaluating reader-side interventions to mitigate this bias, which could inform better model design and retrieval strategies for practitioners.