From Imitation to Alignment: Human-Preference Flow Policies for Long-Horizon Sidewalk Navigation

2026-06-10 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, extended

Summary

FlowPilot is a novel mapless navigation policy designed for robust, efficient long-horizon sidewalk navigation using only a monocular RGB camera. Developed by UCLA, this framework addresses limitations of traditional imitation learning, such as compounding errors and poor social compliance. It employs anchored flow matching for policy pre-training on large-scale robot fleet data, capturing diverse, multimodal navigation behaviors. A subsequent human-in-the-loop preference learning scheme fine-tunes the policy with minimal human intervention data, enhancing counterfactual reasoning and social compliance. In simulation, FlowPilot-Base achieved a 42% success rate and 66% route completion. Real-world experiments demonstrated FlowPilot-HP, with preference fine-tuning, further improved robustness and social compliance, reducing the Intervention Rate by 40.0% and Normalized Intervention Rate by 52.1% compared to the base model.

Key takeaway

For Robotics Engineers deploying micro-mobility robots in complex sidewalk environments, relying solely on imitation learning for navigation policies risks poor social compliance and compounding errors. You should integrate human-in-the-loop preference learning to fine-tune pre-trained models, ensuring robot-specific precision and adherence to social norms. This approach, demonstrated by FlowPilot's 40.0% IR reduction, will significantly improve real-world robustness and reduce necessary human interventions.

Key insights

Bridging imitation learning with human preference alignment significantly enhances robot navigation robustness and social compliance.

Principles

Anchored flow matching models multimodal action distributions.
Preference learning refines policies for robot-specific precision.
Gated cross-attention improves context utilization, mitigating goal-driven shortcuts.

Method

Pre-train a policy using anchored flow matching on large datasets, then fine-tune it with reward-free preference learning from human interventions, regularizing towards the initial policy.

In practice

Apply anchored flow matching for multimodal trajectory prediction.
Use human-in-the-loop preference learning for robot-specific adaptation.
Integrate gated attention to enhance scene context understanding.

Topics

Sidewalk Navigation
Autonomous Robotics
Imitation Learning
Preference Learning
Flow Matching
Monocular Vision

Best for: Computer Vision Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, Robotics Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.