Research
Benchmarking Vision-Language-Action Models on SO-101: Failure and Recovery Analysis
This article introduces the SO-101 benchmark for evaluating Vision-Language-Action (VLA) models on low-cost robotic platforms, addressing a gap in the robustness assessment of these models in real-world scenarios. The benchmark includes four manipulation tasks and features a structured failure taxonomy and recovery-aware evaluation metrics, revealing that stronger pretrained VLA models outperform imitation learning baselines, although performance varies significantly by task. The findings emphasize the need for comprehensive failure and recovery analysis in embodied AI systems, particularly in low-cost deployments.
vision-languagebenchmarkrobotics