RAG
RTSGameBench: An RTS Benchmark for Strategic Reasoning by Vision-Language Models
RTSGameBench is a newly introduced benchmark designed to evaluate strategic reasoning in Vision-Language Models (VLMs) using the Beyond All Reason RTS game. It features a comprehensive evaluation framework that includes diverse gameplay scenarios, mini-games for targeted competency assessment, and a self-evolving generation system for creating new challenges. The benchmark reveals that current state-of-the-art VLMs struggle with coordination and planning in complex, multi-agent environments, highlighting critical areas for improvement in AI models applied to strategic tasks.
strategic reasoningvlmbenchmark