Multimodal
V-REX: Benchmarking Exploratory Visual Reasoning via Chain-of-Questions
The paper introduces V-REX, a benchmarking suite designed for evaluating visual reasoning in vision-language models (VLMs) through a multi-step exploratory approach using a Chain-of-Questions (CoQ) framework. V-REX allows for detailed assessment of VLMs’ capabilities in planning and following complex tasks, highlighting performance discrepancies and areas needing enhancement in handling open-ended visual reasoning tasks. This evaluation protocol is crucial for practitioners aiming to improve VLMs' interpretative abilities and reasoning processes in real-world applications.
visual reasoningevaluationbenchmark