Research
Do VLMs Reason Like Engineers? A Benchmark and a Stage-wise Evaluation
The authors introduced EngVQA, a new multimodal benchmark designed to evaluate engineering reasoning in Vision-Language Models (VLMs) across five engineering subjects, comprising 696 problems. They developed an 8-stage automatic evaluation framework that assesses each step of the reasoning process, revealing significant limitations in current VLMs' engineering reasoning capabilities. This work emphasizes the necessity for process-oriented evaluations to enhance the reliability of multimodal systems in technical domains, which is crucial for applications in engineering education and scientific assistance.
vlmengineeringbenchmarkreasoning