InferencearXiv cs.AI — 8 d ago

Mitigating Visual Hallucinations in Multimodal Systems through Retrieval-Augmented Reliability-Aware Inference

This article presents a retrieval-augmented reliability-aware inference framework designed to enhance the reliability of multimodal large language models (MLLMs) in visual understanding tasks. The framework employs a database of pretrained visual embeddings for nearest-neighbor retrieval and incorporates multiple reliability indicators to assess prediction trustworthiness. Experimentation on ImageNet-100 indicates that this approach boosts accepted prediction accuracy from 85.84% to 88.88% while reducing the hallucination-like wrong-answer rate from 14.16% to 11.12%, offering a method to improve prediction calibration without the need for retraining large models, which is crucial for practitioners seeking to mitigate visual hallucinations in AI systems.

multimodalvisual hallucinationsreliabilityrelevance 0.00 · engagement 0.00

Read at source ↗← all news