Rule-based High-Level Coaching for Goal-Conditioned Reinforcement Learning in Search-and-Rescue UAV Missions Under Limited-Simulation Training
Summary
A hierarchical decision-making framework has been developed for unmanned aerial vehicle (UAV) search-and-rescue (SAR) missions, specifically designed for scenarios with limited simulation training. This framework integrates a fixed rule-based high-level advisor with an online goal-conditioned low-level reinforcement learning (RL) controller. The high-level advisor, defined offline from a structured task specification, provides interpretable guidance on recommended and avoided actions, along with regime-dependent arbitration weights. The low-level controller learns online using dense rewards and reuses experience via a mode-aware prioritized replay mechanism enhanced with rule-derived metadata. Evaluated on battery-aware multi-goal delivery and moving-target delivery in obstacle-rich environments, the method significantly improves early safety and sample efficiency by reducing collision terminations, while maintaining online adaptability to scenario-specific dynamics.
Key takeaway
For research scientists developing autonomous systems for critical missions like search-and-rescue, this framework offers a robust approach to integrate safety and efficiency. You should consider adopting a hierarchical decision-making structure that combines deterministic rule-based guidance with online reinforcement learning, especially when pre-training data or extensive simulation time is limited. This can significantly reduce collision rates and improve early mission success.
Key insights
A hybrid rule-based and RL framework enhances UAV mission safety and efficiency in limited-simulation SAR.
Principles
- Combine fixed rules with online learning.
- Prioritize safety and sample efficiency.
- Ensure adaptability to new scenarios.
Method
The method uses an offline rule-based high-level advisor for guidance and an online goal-conditioned low-level RL controller that learns from dense rewards and reuses experience with rule-derived metadata.
In practice
- Apply to battery-aware multi-goal delivery.
- Use for moving-target delivery in obstacles.
Topics
- Hierarchical Reinforcement Learning
- UAV Search-and-Rescue
- Rule-based Guidance
- Goal-Conditioned RL
- Limited Simulation Training
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Robotics Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.