Towards End to End Motion Planning and Execution for Autonomous Underwater Vehicles Using Reinforcement Learning

2026-06-07 · Source: Machine Learning · Field: Technology & Digital — Robotics & Autonomous Systems, Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

An end-to-end Deep Reinforcement Learning (DRL) approach is proposed for Autonomous Underwater Vehicles (AUVs), aiming to map raw sensor data directly to thruster commands and reduce complex engineering pipelines. This method employs a hierarchical reinforcement learning (HRL) architecture, splitting the task into two Markov Decision Processes. A High-Level policy, operating at 2Hz, processes 84 × 84 pixel monocular camera frames, 100 × 100 pixel forward-looking imaging sonar, and proprioceptive data to generate spatial subgoals. Concurrently, a Low-Level policy, operating at 10Hz, translates these subgoals into thruster commands. The HL policy is trained using Reinforcement Learning from Prior Demonstrations (RLPD) within a modified Sample-Efficient Robotic Reinforcement Learning (SERL) framework, while the LL policy uses Soft Actor-Critic (SAC) combined with Hindsight Experience Replay (HER). Evaluated in the HoloOcean simulator, the system successfully avoids obstacles, achieving trajectory lengths within 4% to 6% of an RRT* planning baseline. It also demonstrates robustness to sensor noise and decreased visibility, though generalization to novel obstacle shapes remains a limitation.

Key takeaway

For Robotics Engineers designing autonomous underwater vehicle (AUV) navigation systems, this hierarchical reinforcement learning (HRL) approach offers a path to simplify complex control pipelines. You should consider HRL for end-to-end control, mapping raw sensor data directly to thruster commands, especially where robustness to sensor noise is critical. However, be mindful of potential generalization limitations when deploying in environments with novel obstacle geometries.

Key insights

An end-to-end hierarchical DRL system enables AUV navigation from raw sensor data to thruster commands.

Principles

Decompose complex control into hierarchical policies.
Integrate prior demonstrations for sample efficiency.
Combine SAC with HER for low-level control.

Method

A hierarchical reinforcement learning architecture uses a 2Hz High-Level policy for spatial subgoals from raw sensor data and a 10Hz Low-Level policy for thruster commands.

In practice

Apply HRL for AUV obstacle avoidance.
Enhance robustness to sensor noise in underwater systems.

Topics

Autonomous Underwater Vehicles
Hierarchical Reinforcement Learning
End-to-End Control
Motion Planning
Obstacle Avoidance
Sensor Fusion

Best for: Research Scientist, AI Scientist, Robotics Engineer, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.