Agents
RoboPIN: Grounded Embodied Reasoning via Pinned Chain-of-Thought
The paper introduces RoboPIN, a new structured reasoning paradigm called Pinned Chain-of-Thought (\pincot{}), designed to enhance embodied reasoning by anchoring each reasoning step to visual evidence. The model, which operates with 4 billion parameters, demonstrates a 12% performance improvement over the 7 billion parameter Mimo-Embodied model across 14 benchmarks, emphasizing improved grounding accuracy and identity consistency through a novel data generation pipeline and a three-stage post-training approach. This advancement is significant for practitioners as it addresses common issues in multi-step reasoning tasks, particularly in dynamic visual environments.
embodied reasoningvisual groundingchain-of-thought