Towards End to End Motion Planning and Execution for Autonomous Underwater Vehicles Using Reinforcement Learning
Summary
An end-to-end Deep Reinforcement Learning (DRL) approach is proposed for Autonomous Underwater Vehicles (AUVs), aiming to map raw sensor data directly to thruster commands and reduce complex engineering pipelines. This method employs a hierarchical reinforcement learning (HRL) architecture, splitting the task into two Markov Decision Processes. A High-Level policy, operating at 2Hz, processes 84 × 84 pixel monocular camera frames, 100 × 100 pixel forward-looking imaging sonar, and proprioceptive data to generate spatial subgoals. Concurrently, a Low-Level policy, operating at 10Hz, translates these subgoals into thruster commands. The HL policy is trained using Reinforcement Learning from Prior Demonstrations (RLPD) within a modified Sample-Efficient Robotic Reinforcement Learning (SERL) framework, while the LL policy uses Soft Actor-Critic (SAC) combined with Hindsight Experience Replay (HER). Evaluated in the HoloOcean simulator, the system successfully avoids obstacles, achieving trajectory lengths within 4% to 6% of an RRT* planning baseline. It also demonstrates robustness to sensor noise and decreased visibility, though generalization to novel obstacle shapes remains a limitation.
Key takeaway
For Robotics Engineers designing autonomous underwater vehicle (AUV) navigation systems, this hierarchical reinforcement learning (HRL) approach offers a path to simplify complex control pipelines. You should consider HRL for end-to-end control, mapping raw sensor data directly to thruster commands, especially where robustness to sensor noise is critical. However, be mindful of potential generalization limitations when deploying in environments with novel obstacle geometries.
Key insights
An end-to-end hierarchical DRL system enables AUV navigation from raw sensor data to thruster commands.
Principles
- Decompose complex control into hierarchical policies.
- Integrate prior demonstrations for sample efficiency.
- Combine SAC with HER for low-level control.
Method
A hierarchical reinforcement learning architecture uses a 2Hz High-Level policy for spatial subgoals from raw sensor data and a 10Hz Low-Level policy for thruster commands.
In practice
- Apply HRL for AUV obstacle avoidance.
- Enhance robustness to sensor noise in underwater systems.
Topics
- Autonomous Underwater Vehicles
- Hierarchical Reinforcement Learning
- End-to-End Control
- Motion Planning
- Obstacle Avoidance
- Sensor Fusion
Best for: Research Scientist, AI Scientist, Robotics Engineer, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.