The Context Gathering Decision Process: A POMDP Framework for Agentic Search

2026-05-11 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Expert, extended

Summary

Researchers from the University of Michigan and Netflix have formalized the challenge of managing context in Large Language Model (LLM) agents operating in complex environments as the Context Gathering Decision Process (CGDP), a specialized Partially Observable Markov Decision Process (POMDP). They propose that LLM behavior in CGDPs can be modeled as approximate Thompson Sampling and introduce Predicate-Based Adaptive Identification (PBAI), an abstract algorithm that decomposes agentic search into modular operations. Based on this framework, they derived two plug-and-play interventions: a persistent, predicate-based belief state that bounds context while preserving multi-hop reasoning, and a programmatic exhaustion gate that halts unproductive search. Empirical validation across four methods and three question-answering domains (LoCoMo, MuSiQue, SWE-QA-Pro) shows that the CGDP-motivated belief state improves multi-hop reasoning by up to 11.4%, and the programmatic exhaustion detection saves up to 39% of tokens without degrading performance, outperforming LLM self-assessment.

Key takeaway

For AI Architects and NLP Engineers designing LLM agentic systems, adopting the Context Gathering Decision Process (CGDP) framework can significantly enhance reliability and efficiency. Implement orchestrator-managed, predicate-based belief states to prevent context degradation and improve multi-hop reasoning. Additionally, integrate programmatic exhaustion gates, leveraging metrics like Jaccard similarity and Unique Passage Rate, to reduce token consumption by up to 39% and avoid premature stopping or infinite loops, rather than relying on LLM self-assessment.

Key insights

Formalizing LLM agent search as a CGDP enables modular interventions to improve context management and search efficiency.

Principles

Orchestrators should dictate operations, not representations.
Explicit state tracking mitigates LLM context degradation.
Programmatic heuristics outperform LLM self-assessment for stopping.

Method

The Predicate-Based Adaptive Identification (PBAI) loop involves evaluating stopping criteria, selecting actions to resolve predicates, observing environment feedback, and updating a belief state with facts and open questions.

In practice

Implement a predicate-based belief state for multi-hop reasoning.
Use programmatic exhaustion gates to prevent redundant search.
Prefer freeform text over rigid JSON for belief state representation.

Topics

Context Gathering Decision Process
LLM Agents
Partially Observable Markov Decision Process
Predicate-Based Belief State
Programmatic Exhaustion Gate

Best for: AI Architect, NLP Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.