Agents
From Imitation to Alignment: Human-Preference Flow Policies for Long-Horizon Sidewalk Navigation
The article presents FlowPilot, a novel mapless navigation policy designed for long-horizon sidewalk navigation using only a monocular RGB camera. It employs anchored flow matching for policy pre-training on large-scale robot fleet data, enhancing counterfactual reasoning and social compliance through a human-in-the-loop preference learning scheme. FlowPilot demonstrates a 42% success rate and 66% route completion in simulations, with FlowPilot-HP showing improved real-world performance, reducing incident rates by 40% and non-injury rates by 52% compared to the base model, which is significant for practitioners focusing on autonomous navigation in complex environments.
navigationimitation-learninglong-horizonpolicy