MultimodalarXiv cs.AI — 7 d ago

Bounding Boxes as Goals: Language-Conditioned Grasping via Neuro-Symbolic Planning

The article introduces GRASP (Grounded Reasoning and Symbolic Planning), a framework that enables robots to perform tabletop manipulation tasks based on natural-language prompts without extensive training. By utilizing a pretrained Vision-Language Model (VLM) and a bounding-box detection pipeline, GRASP translates language into neuro-symbolic goal states, achieving a 73.3% success rate across 90 real-robot trials at varying difficulty levels. This advancement reduces the computational burden and enhances the adaptability of robots in dynamic environments, making it significant for practitioners working with language-conditioned robotic systems.

roboticsvision-languagetask planningrelevance 0.00 · engagement 0.00

Read at source ↗← all news