LedgerAgent: Structured State for Policy-Adherent Tool-Calling Agents

· Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Robotics & Autonomous Systems · Depth: Expert, extended

Summary

LedgerAgent is an inference-time method designed to improve policy-adherent tool-calling agents by explicitly managing task states. Traditional agents often embed observations, tool returns, and policy instructions within the prompt, leading to implicit state management and potential failures from stale or incorrect information, or policy violations. LedgerAgent addresses this by maintaining observed task states in a separate, schema-anchored typed ledger, which is then rendered into the prompt for the agent. Crucially, it incorporates a policy gate that checks environment-changing tool calls against domain rules and the current ledger state before execution, blocking violations. Evaluated across four customer-service domains (Airline, Retail, Telecom, Telehealth) and a panel of six models including GPT-5.2, GPT-4.1, Kimi K2.5, GLM-5, MiniMax-M2.5, and Qwen3-30B, LedgerAgent consistently improved average pass^k scores. For instance, it showed gains of 3.4 to 15.5 points in average pass^1 and 5.6 to 15.5 points in average pass^4, particularly on tasks requiring environment-changing actions, and without incurring additional token overhead compared to methods like IRMA which had over 50% overhead. Error analysis revealed remaining failures are primarily missed actions and domain-specific argument errors.

Key takeaway

For MLOps Engineers deploying conversational AI agents in customer service, you should consider integrating explicit state management like LedgerAgent. This approach significantly improves agent reliability and policy adherence by preventing actions based on stale information or policy violations before they occur. Implement a structured ledger for tool outputs and a pre-execution policy gate to ensure your agents consistently perform correct, state-grounded actions, especially in environments with critical write operations.

Key insights

Explicitly managing observed task state and pre-checking actions prevents policy violations in tool-calling agents.

Principles

Method

LedgerAgent uses a schema-anchored typed ledger for successful read-tool returns, rendering it into the prompt. A policy gate then evaluates proposed environment-changing calls against ledger state and domain predicates before execution.

In practice

Topics

Best for: Research Scientist, AI Architect, Machine Learning Engineer, AI Scientist, AI Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.