Progressive Autonomy as Preference Learning: A Formalization of Trust Calibration for Agentic Tool Use

2026-05-21 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, long

Summary

A new formalization of trust calibration for agentic tool use is presented, framing the decision of autonomous agent action versus human approval as a preference-learning problem. This approach introduces a policy gateway that maintains a Gaussian-process posterior over a latent human risk-tolerance function, observing binary approve/deny feedback. The system escalates actions where approval is most uncertain, structurally resembling Preferential Bayesian Optimization but focused on classifying action spaces into allow/block/ask regions. It incorporates a time-decaying kernel to model dynamic human risk tolerance and demonstrates correlated generalization. A simulation study, processing 1500 decision points over 6 seeds, showed the GP gateway auto-decided 68% of actions at 97.3% accuracy with a 2.4% false-allow rate on validation, and 99.7% post-changepoint, reducing human interruptions by approximately 1.8x. However, the "ask band" acquisition rule did not prove sample-efficient compared to random querying.

Key takeaway

For MLOps Engineers deploying agentic systems, you should consider implementing a learning-based policy gateway for action approval. This approach, which adapts to human risk tolerance and generalizes across actions, can significantly reduce human intervention by ~1.8x while maintaining high accuracy. Be aware that the current "ask band" rule may not be optimal for sample-efficient learning, suggesting a need for more advanced acquisition functions in your design.

Key insights

Trust calibration for agentic tool use can be formalized as a preference-learning problem using Gaussian processes.

Principles

Human risk tolerance is a latent function to be learned.
Uncertainty-targeted querying maximizes information per human interruption.
Correlated generalization improves decision accuracy across similar actions.

Method

A policy gateway maintains a Gaussian-process posterior over latent human risk tolerance, observed via probit likelihood on binary feedback. It classifies actions into allow/block/ask regions, escalating uncertain actions. Non-stationarity is handled by a time-decaying kernel.

In practice

Implement a GP-probit policy gateway for agent action approval.
Use a time-decaying kernel to adapt to changing human risk tolerance.
Leverage structured kernels for correlated generalization across tools.

Topics

Agentic AI Systems
Trust Calibration
Preference Learning
Gaussian Processes
Bayesian Optimization
Human-in-the-Loop AI

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.