From Imitation to Alignment: Human-Preference Flow Policies for Long-Horizon Sidewalk Navigation
Summary
FlowPilot is a novel mapless navigation policy designed for robust, efficient long-horizon sidewalk navigation using only a monocular RGB camera. Developed by UCLA, this framework addresses limitations of traditional imitation learning, such as compounding errors and poor social compliance. It employs anchored flow matching for policy pre-training on large-scale robot fleet data, capturing diverse, multimodal navigation behaviors. A subsequent human-in-the-loop preference learning scheme fine-tunes the policy with minimal human intervention data, enhancing counterfactual reasoning and social compliance. In simulation, FlowPilot-Base achieved a 42% success rate and 66% route completion. Real-world experiments demonstrated FlowPilot-HP, with preference fine-tuning, further improved robustness and social compliance, reducing the Intervention Rate by 40.0% and Normalized Intervention Rate by 52.1% compared to the base model.
Key takeaway
For Robotics Engineers deploying micro-mobility robots in complex sidewalk environments, relying solely on imitation learning for navigation policies risks poor social compliance and compounding errors. You should integrate human-in-the-loop preference learning to fine-tune pre-trained models, ensuring robot-specific precision and adherence to social norms. This approach, demonstrated by FlowPilot's 40.0% IR reduction, will significantly improve real-world robustness and reduce necessary human interventions.
Key insights
Bridging imitation learning with human preference alignment significantly enhances robot navigation robustness and social compliance.
Principles
- Anchored flow matching models multimodal action distributions.
- Preference learning refines policies for robot-specific precision.
- Gated cross-attention improves context utilization, mitigating goal-driven shortcuts.
Method
Pre-train a policy using anchored flow matching on large datasets, then fine-tune it with reward-free preference learning from human interventions, regularizing towards the initial policy.
In practice
- Apply anchored flow matching for multimodal trajectory prediction.
- Use human-in-the-loop preference learning for robot-specific adaptation.
- Integrate gated attention to enhance scene context understanding.
Topics
- Sidewalk Navigation
- Autonomous Robotics
- Imitation Learning
- Preference Learning
- Flow Matching
- Monocular Vision
Best for: Computer Vision Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, Robotics Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.