Learning and Reusing Policy Decompositions for Hierarchical Generalized Planning with LLM Agents

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Software Development & Engineering · Depth: Expert, extended

Summary

A dynamic policy-learning approach, Hierarchical Component Learning for Generalized Policies (HCL-GP), integrates generalized planning and hierarchical task decomposition for LLM-based agents. This method learns parameterized policies that generalize across task instances and automatically extracts reusable components from successful executions, organizing them into a component library for compositional policy generation. HCL-GP addresses challenges in automated decomposition, component generalization for reuse, and efficient retrieval via semantic search. Evaluated on the AppWorld benchmark, HCL-GP achieved 98.2% accuracy on normal tasks and 97.8% on challenge tasks with unseen applications, marking a 15.8 percentage point improvement over static synthesis in challenging scenarios. For open-source models like OpenAI GPT OSS 120B, dynamic reuse enabled a 62.5% success rate compared to near-zero without reuse, demonstrating the effective integration of classical planning concepts with LLM agents for enhanced accuracy and efficiency.

Key takeaway

For research scientists developing LLM-based agents for complex, multi-step domains, adopting a dynamic policy-learning architecture like HCL-GP is crucial. This approach, which learns and reuses executable policy components, significantly improves efficiency and accuracy, especially in challenging scenarios or when using less capable open-source models. You should prioritize integrating execution-grounded learning to build a robust, compositional knowledge base that generalizes across applications.

Key insights

Dynamic policy learning with reusable components significantly boosts LLM agent performance and efficiency across diverse tasks.

Principles

Method

HCL-GP synthesizes parameterized policies, extracts reusable components from successful executions, and generalizes them through clustering and deduplication for a validated component repository, all without symbolic domain models.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.