Research
LaViSA: A Language and Vision Structural Ambiguity Benchmark
LaViSA is a new benchmark introduced to assess Vision and Language Models (VLMs) on their ability to resolve structural ambiguity by utilizing visual cues. The benchmark includes ambiguous sentences, their disambiguated forms, and relevant images across seven categories of ambiguity. Evaluation results reveal that while recent VLMs show some capability in leveraging visual information, they still face challenges with specific ambiguity types and nuanced semantic distinctions, highlighting areas for improvement in model performance.
VLMbenchmarkambiguity