Learning and Reusing Policy Decompositions for Hierarchical Generalized Planning with LLM Agents

2026-05-11 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Software Development & Engineering · Depth: Expert, extended

Summary

A dynamic policy-learning approach, Hierarchical Component Learning for Generalized Policies (HCL-GP), integrates generalized planning and hierarchical task decomposition for LLM-based agents. This method learns parameterized policies that generalize across task instances and automatically extracts reusable components from successful executions, organizing them into a component library for compositional policy generation. HCL-GP addresses challenges in automated decomposition, component generalization for reuse, and efficient retrieval via semantic search. Evaluated on the AppWorld benchmark, HCL-GP achieved 98.2% accuracy on normal tasks and 97.8% on challenge tasks with unseen applications, marking a 15.8 percentage point improvement over static synthesis in challenging scenarios. For open-source models like OpenAI GPT OSS 120B, dynamic reuse enabled a 62.5% success rate compared to near-zero without reuse, demonstrating the effective integration of classical planning concepts with LLM agents for enhanced accuracy and efficiency.

Key takeaway

For research scientists developing LLM-based agents for complex, multi-step domains, adopting a dynamic policy-learning architecture like HCL-GP is crucial. This approach, which learns and reuses executable policy components, significantly improves efficiency and accuracy, especially in challenging scenarios or when using less capable open-source models. You should prioritize integrating execution-grounded learning to build a robust, compositional knowledge base that generalizes across applications.

Key insights

Dynamic policy learning with reusable components significantly boosts LLM agent performance and efficiency across diverse tasks.

Principles

Generalize policies across task instances.
Decompose complex tasks into reusable sub-tasks.
Induce reusable structure dynamically from executions.

Method

HCL-GP synthesizes parameterized policies, extracts reusable components from successful executions, and generalizes them through clustering and deduplication for a validated component repository, all without symbolic domain models.

In practice

Use semantic search for component retrieval.
Implement iterative validation with debugging feedback.
Cluster components by functional similarity.

Topics

Hierarchical Generalized Planning
LLM Agents
Policy Decomposition
Component Reuse
AppWorld Benchmark

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.