From Imitation to Alignment: Human-Preference Flow Policies for Long-Horizon Sidewalk Navigation

· Source: Artificial Intelligence · Field: Technology & Digital — Robotics & Autonomous Systems, Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

FlowPilot is a novel mapless navigation policy designed for autonomous long-horizon sidewalk navigation, utilizing only a single monocular RGB camera. Developed to overcome limitations of traditional imitation learning, such as compounding errors and poor social compliance, FlowPilot employs anchored flow matching for policy pre-training on extensive robot fleet data. This approach captures complex, multimodal sidewalk navigation behaviors. To enhance counterfactual reasoning and social compliance, the policy integrates a human-in-the-loop preference learning scheme, fine-tuned with minimal human intervention data. Evaluated in diverse simulation and real-world environments, FlowPilot achieved a 42% success rate and 66% route completion in simulation. Its human-preference tuned variant, FlowPilot-HP, further demonstrated improved real-world robustness and social compliance, reducing IR by 40.0% and NIR by 52.1% compared to the base model, making it suitable for micro-mobility applications like robotic food delivery.

Key takeaway

For Robotics Engineers developing autonomous micro-mobility solutions, FlowPilot offers a robust approach to long-horizon sidewalk navigation. If your current imitation learning policies suffer from compounding errors or social compliance issues, consider integrating anchored flow matching for pre-training and a human-in-the-loop preference learning scheme. This method can significantly improve real-world robustness and social compliance, reducing intervention rates and enhancing safety for your robotic food delivery or assistive wheelchair applications.

Key insights

FlowPilot combines anchored flow matching with human-in-the-loop preference learning for robust, socially compliant sidewalk navigation.

Principles

Method

Pre-train with anchored flow matching on robot fleet data, then fine-tune using human-in-the-loop preference learning with intervention data to enhance social compliance and counterfactual reasoning.

In practice

Topics

Best for: Computer Vision Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, Robotics Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.