ResearcharXiv cs.AI — 4 d ago

Do VLMs Reason Like Engineers? A Benchmark and a Stage-wise Evaluation

The authors introduced EngVQA, a new multimodal benchmark designed to evaluate engineering reasoning in Vision-Language Models (VLMs) across five engineering subjects, comprising 696 problems. They developed an 8-stage automatic evaluation framework that assesses each step of the reasoning process, revealing significant limitations in current VLMs' engineering reasoning capabilities. This work emphasizes the necessity for process-oriented evaluations to enhance the reliability of multimodal systems in technical domains, which is crucial for applications in engineering education and scientific assistance.

vlmengineeringbenchmarkreasoningrelevance 0.00 · engagement 0.00

Read at source ↗← all news