ResearcharXiv cs.CL — 8 d ago

Towards Mitigating Hallucinations in Large Vision-Language Models by Refining Textual Embeddings

This article presents a new method for mitigating hallucinations in Large Vision-Language Models (LVLMs) by refining textual embeddings to better incorporate visual features. The proposed approach enhances multimodal reasoning by promoting a balanced attention distribution between text and visual inputs, leading to significant improvements in hallucination benchmarks, including +9.33% on MMVP-MLLM and +3% on HallusionBench. This advancement is crucial for practitioners as it addresses the prevalent issue of LLMs producing linguistically coherent but visually inaccurate outputs, thereby improving the reliability of LVLM applications.

hallucinationsvision-language-modelsembeddingrelevance 0.00 · engagement 0.00

Read at source ↗← all news