Unified Map Prior Encoder for Mapping and Planning

· Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, quick

Summary

UMPE, a Unified Map Prior Encoder, addresses the underutilization of diverse map priors like HD/SD vector maps, rasterized SD maps, and satellite imagery in autonomous driving's online mapping and end-to-end (E2E) planning. It is designed to ingest any subset of these four priors and fuse them with Bird's-Eye-View (BEV) features. UMPE features a vector encoder that pre-aligns polylines with SE(2) correction and encodes points using multi-frequency sinusoidal features, producing polyline tokens with confidence scores. A raster encoder, utilizing a ResNet-18 backbone conditioned by FiLM, performs SE(2) micro-alignment and injects priors via zero-initialized residual fusion. This architecture significantly improves mapping performance, boosting MapTRv2 from 61.5 to 67.4 mAP (+5.9) and MapQR from 66.4 to 71.7 mAP (+5.3) on nuScenes, and adding +4.1 mAP on Argoverse2. For E2E planning with a VAD backbone on nuScenes, UMPE reduces trajectory error from 0.72 to 0.42 m L2 and collision rate from 0.22% to 0.12%.

Key takeaway

For autonomous driving engineers developing mapping and planning systems, UMPE demonstrates that integrating heterogeneous map priors through an alignment-aware, unified encoder can substantially improve performance. You should consider adopting similar multi-modal fusion architectures to enhance both mapping accuracy and E2E planning robustness, especially when dealing with varied data sources and potential pose drift. This approach offers significant gains over sensor-centric methods.

Key insights

Unified, alignment-aware fusion of heterogeneous map priors significantly enhances autonomous driving mapping and planning.

Principles

Method

UMPE uses separate vector and raster encoders with SE(2) alignment, cross-attention with confidence bias, and zero-initialized residual fusion, applying a vector-then-raster fusion order.

In practice

Topics

Best for: Computer Vision Engineer, Research Scientist, AI Scientist, Robotics Engineer, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.