SVoT: State-aware Visualization-of-Thought for Spatial Reasoning via Reinforcement Learning
The article introduces SVoT (State-aware Visualization-of-Thought), a reinforcement learning framework designed to enhance spatial reasoning in Multimodal Large Language Models (MLLMs) by generating verifiable intermediate states and visualizations through transition reasoning chains. SVoT employs Group Relative Policy Optimization (GRPO) for training and establishes five domains, including novel environments like Pacman and Gather, to systematically evaluate multi-hop spatial reasoning, achieving up to a 65% accuracy improvement on out-of-distribution test sets. This advancement is crucial for practitioners as it addresses limitations in current models' handling of intermediate states and state transitions, thereby improving reliability in complex reasoning tasks.