Agents
LabVLA: Grounding Vision-Language-Action Models in Scientific Laboratories
LabVLA is a new Vision-Language-Action model designed to bridge the gap between written experimental protocols and robotic execution in scientific laboratories. It utilizes a two-stage training approach with the Qwen3-VL-4B-Instruct backbone, incorporating FAST action token pretraining and flow matching posttraining with a DiT action expert. LabVLA demonstrates superior performance on the LabUtopia benchmark, achieving the highest average success rate, which highlights its potential for enhancing automation in laboratory settings where traditional VLA models fall short.
vision-languageroboticsscientific laboratories