Embodied-BenchClaw: An Autonomous Multi-Agent System for Embodied Spatial Intelligence Benchmark Construction
Embodied-BenchClaw is an autonomous multi-agent system designed for the efficient construction of embodied spatial intelligence benchmarks through a five-stage pipeline: intent blueprinting, data collection, structuring and cleaning, benchmark synthesis, and evaluation reporting. This system enhances benchmark reusability and reliability by introducing an extensible Skill Library and quality control processes, allowing for the creation of diverse benchmarks across various domains such as indoor and outdoor spatial reasoning and robotic manipulation. The capability to automate benchmark construction significantly reduces manual effort while ensuring the benchmarks remain verifiable and maintainable, which is crucial for advancing research in embodied AI systems.