Research
The Art of Interrogation: Consistency Amplifies Factuality in Spatial Reasoning
This article presents a self-supervised reinforcement learning framework aimed at enhancing spatial reasoning in Large Reasoning Models (LRMs) without the need for labeled data. It introduces the concept of consistency verifiers as reward functions to evaluate geometric and semantic coherence, leveraging both image and textual transformations. The proposed optimal transport-based RL strategy, OT-GRPO, enables models to achieve accuracy comparable to those trained with ground-truth supervision, thereby offering a promising approach for practitioners to improve LRM performance in spatial reasoning tasks.
spatial reasoningself-supervised learningconsistency