AgentsarXiv cs.AI — 4 d ago

LUCID: Learning Embodiment-Agnostic Intent Models from Unstructured Human Videos for Scalable Dexterous Robot Skill Acquisition

LUCID is a two-stage framework that enables the learning of embodiment-agnostic intent models from unstructured human videos, facilitating scalable robot skill acquisition. The model predicts short-horizon intent based on current observations and translates this intent into robot actions using an embodiment-specific sensorimotor policy. Evaluated on five real-world manipulation tasks, LUCID demonstrates zero-shot transfer capabilities and highlights the potential of leveraging internet-scale video datasets for training robot skills without the need for structured demonstrations.

robot-learningintent-modelsunstructured-datarelevance 0.00 · engagement 0.00

Read at source ↗← all news