Learning and Reusing Policy Decompositions for Hierarchical Generalized Planning with LLM Agents
Summary
A dynamic policy-learning approach, Hierarchical Component Learning for Generalized Policies (HCL-GP), integrates generalized planning and hierarchical task decomposition for LLM-based agents. This method learns parameterized policies that generalize across task instances and automatically extracts reusable components from successful executions, organizing them into a component library for compositional policy generation. HCL-GP addresses challenges in automated decomposition, component generalization for reuse, and efficient retrieval via semantic search. Evaluated on the AppWorld benchmark, HCL-GP achieved 98.2% accuracy on normal tasks and 97.8% on challenge tasks with unseen applications, marking a 15.8 percentage point improvement over static synthesis in challenging scenarios. For open-source models like OpenAI GPT OSS 120B, dynamic reuse enabled a 62.5% success rate compared to near-zero without reuse, demonstrating the effective integration of classical planning concepts with LLM agents for enhanced accuracy and efficiency.
Key takeaway
For research scientists developing LLM-based agents for complex, multi-step domains, adopting a dynamic policy-learning architecture like HCL-GP is crucial. This approach, which learns and reuses executable policy components, significantly improves efficiency and accuracy, especially in challenging scenarios or when using less capable open-source models. You should prioritize integrating execution-grounded learning to build a robust, compositional knowledge base that generalizes across applications.
Key insights
Dynamic policy learning with reusable components significantly boosts LLM agent performance and efficiency across diverse tasks.
Principles
- Generalize policies across task instances.
- Decompose complex tasks into reusable sub-tasks.
- Induce reusable structure dynamically from executions.
Method
HCL-GP synthesizes parameterized policies, extracts reusable components from successful executions, and generalizes them through clustering and deduplication for a validated component repository, all without symbolic domain models.
In practice
- Use semantic search for component retrieval.
- Implement iterative validation with debugging feedback.
- Cluster components by functional similarity.
Topics
- Hierarchical Generalized Planning
- LLM Agents
- Policy Decomposition
- Component Reuse
- AppWorld Benchmark
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.