Agents
MineExplorer: Evaluating Open-World Exploration of MLLM Agents in Minecraft
The paper introduces the MineExplorer benchmark, designed to evaluate the open-world exploration capabilities of multimodal large language models (MLLMs) within the Minecraft environment. It emphasizes a ReAct-style formulation for task organization, incorporating a multi-agent synthesis workflow that enhances task reliability compared to single-agent approaches. Findings indicate that while advanced MLLMs perform well on simpler tasks, they struggle with complex, multi-hop challenges, highlighting limitations in current models' exploration abilities and the need for improved coordination in dynamic environments.
mllmexplorationminecraft