From Imitation to Alignment: Human-Preference Flow Policies for Long-Horizon Sidewalk Navigation
Summary
FlowPilot is a novel mapless navigation policy designed for autonomous long-horizon sidewalk navigation, utilizing only a single monocular RGB camera. Developed to overcome limitations of traditional imitation learning, such as compounding errors and poor social compliance, FlowPilot employs anchored flow matching for policy pre-training on extensive robot fleet data. This approach captures complex, multimodal sidewalk navigation behaviors. To enhance counterfactual reasoning and social compliance, the policy integrates a human-in-the-loop preference learning scheme, fine-tuned with minimal human intervention data. Evaluated in diverse simulation and real-world environments, FlowPilot achieved a 42% success rate and 66% route completion in simulation. Its human-preference tuned variant, FlowPilot-HP, further demonstrated improved real-world robustness and social compliance, reducing IR by 40.0% and NIR by 52.1% compared to the base model, making it suitable for micro-mobility applications like robotic food delivery.
Key takeaway
For Robotics Engineers developing autonomous micro-mobility solutions, FlowPilot offers a robust approach to long-horizon sidewalk navigation. If your current imitation learning policies suffer from compounding errors or social compliance issues, consider integrating anchored flow matching for pre-training and a human-in-the-loop preference learning scheme. This method can significantly improve real-world robustness and social compliance, reducing intervention rates and enhancing safety for your robotic food delivery or assistive wheelchair applications.
Key insights
FlowPilot combines anchored flow matching with human-in-the-loop preference learning for robust, socially compliant sidewalk navigation.
Principles
- Imitation learning alone struggles with complex, social navigation.
- Human preference data improves counterfactual reasoning.
- Anchored flow matching captures diverse behaviors.
Method
Pre-train with anchored flow matching on robot fleet data, then fine-tune using human-in-the-loop preference learning with intervention data to enhance social compliance and counterfactual reasoning.
In practice
- Deploy FlowPilot for micro-mobility applications.
- Use human feedback to refine autonomous navigation.
- Leverage monocular RGB for lightweight perception.
Topics
- Sidewalk Navigation
- Autonomous Robotics
- Imitation Learning
- Preference Learning
- Flow Matching
- Micro-mobility
- Monocular Vision
Best for: Computer Vision Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, Robotics Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.