Agents
LUCID: Learning Embodiment-Agnostic Intent Models from Unstructured Human Videos for Scalable Dexterous Robot Skill Acquisition
LUCID is a two-stage framework that enables the learning of embodiment-agnostic intent models from unstructured human videos, facilitating scalable robot skill acquisition. The model predicts short-horizon intent based on current observations and translates this intent into robot actions using an embodiment-specific sensorimotor policy. Evaluated on five real-world manipulation tasks, LUCID demonstrates zero-shot transfer capabilities and highlights the potential of leveraging internet-scale video datasets for training robot skills without the need for structured demonstrations.
robot-learningintent-modelsunstructured-data