From Imitation to Alignment: Human-Preference Flow Policies for Long-Horizon Sidewalk Navigation

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, extended

Summary

FlowPilot is a novel mapless navigation policy designed for robust, efficient long-horizon sidewalk navigation using only a monocular RGB camera. Developed by UCLA, this framework addresses limitations of traditional imitation learning, such as compounding errors and poor social compliance. It employs anchored flow matching for policy pre-training on large-scale robot fleet data, capturing diverse, multimodal navigation behaviors. A subsequent human-in-the-loop preference learning scheme fine-tunes the policy with minimal human intervention data, enhancing counterfactual reasoning and social compliance. In simulation, FlowPilot-Base achieved a 42% success rate and 66% route completion. Real-world experiments demonstrated FlowPilot-HP, with preference fine-tuning, further improved robustness and social compliance, reducing the Intervention Rate by 40.0% and Normalized Intervention Rate by 52.1% compared to the base model.

Key takeaway

For Robotics Engineers deploying micro-mobility robots in complex sidewalk environments, relying solely on imitation learning for navigation policies risks poor social compliance and compounding errors. You should integrate human-in-the-loop preference learning to fine-tune pre-trained models, ensuring robot-specific precision and adherence to social norms. This approach, demonstrated by FlowPilot's 40.0% IR reduction, will significantly improve real-world robustness and reduce necessary human interventions.

Key insights

Bridging imitation learning with human preference alignment significantly enhances robot navigation robustness and social compliance.

Principles

Method

Pre-train a policy using anchored flow matching on large datasets, then fine-tune it with reward-free preference learning from human interventions, regularizing towards the initial policy.

In practice

Topics

Best for: Computer Vision Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, Robotics Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.