From Imitation to Alignment: Human-Preference Flow Policies for Long-Horizon Sidewalk Navigation

2026-06-10 · Source: Artificial Intelligence · Field: Technology & Digital — Robotics & Autonomous Systems, Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

FlowPilot is a novel mapless navigation policy designed for autonomous long-horizon sidewalk navigation, utilizing only a single monocular RGB camera. Developed to overcome limitations of traditional imitation learning, such as compounding errors and poor social compliance, FlowPilot employs anchored flow matching for policy pre-training on extensive robot fleet data. This approach captures complex, multimodal sidewalk navigation behaviors. To enhance counterfactual reasoning and social compliance, the policy integrates a human-in-the-loop preference learning scheme, fine-tuned with minimal human intervention data. Evaluated in diverse simulation and real-world environments, FlowPilot achieved a 42% success rate and 66% route completion in simulation. Its human-preference tuned variant, FlowPilot-HP, further demonstrated improved real-world robustness and social compliance, reducing IR by 40.0% and NIR by 52.1% compared to the base model, making it suitable for micro-mobility applications like robotic food delivery.

Key takeaway

For Robotics Engineers developing autonomous micro-mobility solutions, FlowPilot offers a robust approach to long-horizon sidewalk navigation. If your current imitation learning policies suffer from compounding errors or social compliance issues, consider integrating anchored flow matching for pre-training and a human-in-the-loop preference learning scheme. This method can significantly improve real-world robustness and social compliance, reducing intervention rates and enhancing safety for your robotic food delivery or assistive wheelchair applications.

Key insights

FlowPilot combines anchored flow matching with human-in-the-loop preference learning for robust, socially compliant sidewalk navigation.

Principles

Imitation learning alone struggles with complex, social navigation.
Human preference data improves counterfactual reasoning.
Anchored flow matching captures diverse behaviors.

Method

Pre-train with anchored flow matching on robot fleet data, then fine-tune using human-in-the-loop preference learning with intervention data to enhance social compliance and counterfactual reasoning.

In practice

Deploy FlowPilot for micro-mobility applications.
Use human feedback to refine autonomous navigation.
Leverage monocular RGB for lightweight perception.

Topics

Sidewalk Navigation
Autonomous Robotics
Imitation Learning
Preference Learning
Flow Matching
Micro-mobility
Monocular Vision

Best for: Computer Vision Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, Robotics Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.