Safety
GEASS: Gated Evidence-Adaptive Selective Caption Trust for Vision-Language Models
The article introduces GEASS (Gated Evidence-Adaptive Selective Caption Trust), a novel module designed to enhance Vision-Language Models (VLMs) by addressing the issue of hallucination in generated captions. GEASS operates as a training-free, logit-level mechanism that dynamically adjusts the trust in captions based on the model's confidence and the entropy reduction they provide, significantly improving performance on benchmarks like HallusionBench and POPE without adding parameters. This advancement is crucial for practitioners as it enables more reliable integration of captions in VLMs, mitigating the adverse effects of caption competition with image data.
vision-languagehallucinationcaptioning