MultimodalarXiv cs.AI — 12 d ago

Enhancing Pathological VLMs with Cross-scale Reasoning

The paper introduces a novel cross-scale reasoning paradigm for vision-language models (VLMs) in pathology, addressing the need for multi-magnification reasoning in image interpretation. It presents Scale-VQA, a benchmark comprising 4,685 questions based on 2,537 pathology images, and introduces ScaleReasoner-R1, a model trained via reinforcement learning that achieves state-of-the-art results on this new benchmark and established single-scale benchmarks. This advancement is significant for practitioners as it enhances the ability of VLMs to integrate multi-scale evidence, improving diagnostic accuracy in pathological assessments.

pathologyvision-language modelcross-scale reasoningrelevance 0.00 · engagement 0.00

Read at source ↗← all news