SafetyarXiv cs.AI — 7 d ago

ClinHallu: A Benchmark for Diagnosing Stage-Wise Hallucinations in Medical MLLM Reasoning

ClinHallu is a newly introduced benchmark designed for diagnosing stage-wise hallucinations in medical multimodal large language models (MLLMs), comprising 7,031 validated instances with structured reasoning traces categorized into Visual Recognition, Knowledge Recall, and Reasoning Integration. This benchmark facilitates the identification of specific sources of hallucinations during reasoning processes and includes stage-replacement interventions to assess the impact of correcting individual stages on final outputs. ClinHallu's structured approach enables practitioners to better diagnose and mitigate reasoning failures in medical MLLMs, enhancing their reliability for clinical decision support.

hallucinationsmedicalbenchmarkrelevance 0.00 · engagement 0.00

Read at source ↗← all news