ResearcharXiv cs.CL — 16 d ago

LaViSA: A Language and Vision Structural Ambiguity Benchmark

LaViSA is a new benchmark introduced to assess Vision and Language Models (VLMs) on their ability to resolve structural ambiguity by utilizing visual cues. The benchmark includes ambiguous sentences, their disambiguated forms, and relevant images across seven categories of ambiguity. Evaluation results reveal that while recent VLMs show some capability in leveraging visual information, they still face challenges with specific ambiguity types and nuanced semantic distinctions, highlighting areas for improvement in model performance.

VLMbenchmarkambiguityrelevance 0.00 · engagement 0.00

Read at source ↗← all news