Beyond Waypoints: A Trajectory-Centric Waypointing Paradigm for Vision-Language Navigation

2026-06-08 · Source: cs.CV updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, extended

Summary

A novel paradigm called Trajectory Waypoint is introduced for Vision-Language Navigation in Continuous Environments (VLN-CE), addressing the common issues of unreachable waypoints and planning-control inconsistencies found in traditional node-centric methods. This framework comprises a Trajectory Waypoint Predictor (TWP) and a Trajectory-Enhanced Navigator (TEN). The TWP, formulated as a TSDF-guided diffusion policy, generates diverse, collision-free trajectory candidates, achieving a 95.84% Open score on the VLN-CE val-unseen split, an 8.58 absolute margin improvement over baselines. The TEN integrates these continuous paths into a hybrid map for instruction-grounded planning, ensuring tight coupling between high-level semantic decisions and low-level execution. Extensive experiments on the R2R-CE benchmark demonstrate superior performance, with an Oracle Success Rate (OSR) of 68.1, Success Rate (SR) of 60.3, and Success weighted by Path Length (SPL) of 51.4 on the Val-Unseen split.

Key takeaway

For Machine Learning Engineers developing embodied navigation systems, adopting a trajectory-centric waypoint paradigm can significantly improve agent reliability. Your systems will benefit from inherently collision-free paths and tighter planning-execution consistency, reducing errors in complex environments. Consider integrating TSDF-guided diffusion policies for robust trajectory generation and augmenting training with diverse datasets like HM3D to enhance generalization. This approach directly addresses geometric unreachability and planning-control disconnects.

Key insights

Grounding waypoints in executable trajectories resolves planning-control inconsistencies and improves reachability in Vision-Language Navigation.

Principles

Decoupled planning causes inconsistencies.
Trajectory-centric planning improves safety.
TSDF guidance enhances path feasibility.

Method

A Trajectory Waypoint Predictor uses a TSDF-guided diffusion policy to generate collision-free trajectory candidates. A Trajectory-Enhanced Navigator then selects optimal paths by integrating trajectory geometry into a hybrid map.

In practice

Use DINOv3 for robust visual features.
Apply TSDF-based cost guidance for safety.
Augment training data with HM3D scenes.

Topics

Vision-Language Navigation
Trajectory Generation
Diffusion Models
Embodied AI
Robot Navigation
TSDF

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Robotics Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CV updates on arXiv.org.