ResearcharXiv cs.AI — 7 d ago

PaLMR: Towards Faithful Visual Reasoning via Multimodal Process Alignment

PaLMR is a new framework designed to enhance visual reasoning in Multimodal Large Language Models (MLLMs) by addressing process-level misalignment through a dual-component approach. It features a perception-aligned data layer that generates structured pseudo-ground-truths and a process-aligned optimization layer with a hierarchical reward fusion scheme, which collectively improve reasoning fidelity and reduce hallucinations. Experiments with Qwen2.5-VL-7B demonstrate that PaLMR achieves state-of-the-art results on HallusionBench while maintaining strong performance on other benchmarks, indicating its potential to improve the reliability and interpretability of multimodal reasoning systems.

visual reasoningmultimodalreinforcement learningrelevance 0.00 · engagement 0.00

Read at source ↗← all news