Models
SpatialWorld: Benchmarking Interactive Spatial Reasoning of Multimodal Agents in Real-World Tasks
SpatialWorld is a newly introduced benchmark for assessing the interactive spatial reasoning capabilities of multimodal large language models (MLLMs) in real-world tasks. It incorporates eight simulation backends and features 760 human-annotated tasks across various domains, requiring agents to solve problems under vision-only partial observability using a text-based action interface. Evaluation of 15 advanced models, including GPT-5 and Qwen-3.5, highlighted significant challenges, with the best performing model achieving only a 17.4% task success rate, indicating substantial opportunities for improvement in spatial reasoning and planning for practitioners developing AI systems.
multimodalbenchmarkspatialagents