Dynamic Dual-Granularity Skill Bank for Agentic RL
Summary
D2Skill is a dynamic dual-granularity skill bank designed to enhance agentic reinforcement learning (RL) by organizing reusable experience. It features task skills for high-level guidance and step skills for fine-grained decision support and error correction. The system jointly trains its policy and skill bank using paired baseline and skill-injected rollouts, deriving hindsight utility signals from performance gaps for both skill updates and policy optimization. The skill bank continuously expands through reflection, built entirely from training-time experience, and is maintained via utility-aware retrieval and pruning. Experiments on ALFWorld and WebShop, utilizing Qwen2.5-7B-Instruct and Qwen3-4B-Instruct-2507, demonstrate that D2Skill improves success rates by 10-20 points over skill-free baselines. Both dual-granularity skill modeling and dynamic skill maintenance are critical to these performance gains.
Key takeaway
For research scientists developing agentic reinforcement learning systems, D2Skill's approach to dynamic, dual-granularity skill management offers a robust method to significantly improve success rates. You should consider integrating both high-level task skills and fine-grained step skills, coupled with a utility-aware maintenance mechanism, to enhance policy optimization and error correction in your RL agents. This framework can lead to substantial performance gains with modest training overhead.
Key insights
D2Skill enhances agentic RL by dynamically managing dual-granularity skills for improved policy learning and error correction.
Principles
- Dual-granularity skills improve RL agents.
- Hindsight utility drives skill and policy optimization.
- Dynamic skill maintenance is critical for gains.
Method
D2Skill jointly trains policy and skill bank using paired baseline and skill-injected rollouts, deriving hindsight utility from performance gaps. It continuously expands and prunes skills based on utility.
In practice
- Implement task skills for high-level guidance.
- Use step skills for fine-grained decision support.
- Employ utility-aware retrieval for skill management.
Topics
- Agentic Reinforcement Learning
- D2Skill
- Dual-Granularity Skills
- Dynamic Skill Maintenance
- Skill Bank
Best for: Research Scientist, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.