Certified Policy Optimisation for Nested Causal Bandits via PAC-Bayes Risk

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

The paper introduces Nested Contextual Causal Bandits (NCCBs), a problem class formalizing critical sequential decisions where strategic choices causally shape subsequent tactical ones within a hierarchical Structural Causal Model (SCM). To address this, Nested Causal Thompson Sampling (NCTS) is proposed, which operates by drawing one mechanism-factorised belief per episode and acting recursively. A key theoretical contribution is a causal PAC-Bayesian excess-risk bound, enabling off-policy and anytime certification of any candidate deployment policy from historical data. Experiments demonstrate NCTS's factorised SCM-mechanism posterior achieves significantly better zero-shot transfer under exogenous distribution shifts compared to RFF-GP joint regression. The recursive meta-to-inner commit also dominates joint-commit alternatives, and the certificate contracts as offline data accumulates. These findings support "progressive certified handover," a safe-deployment method allowing each timescale to independently switch from a legacy controller to NCTS upon certified gains.

Key takeaway

For AI Scientists designing sequential decision systems with hierarchical causal dependencies, Nested Causal Thompson Sampling (NCTS) offers a robust approach, providing certified policy optimization and improved zero-shot transfer under exogenous distribution shifts. You should consider implementing its progressive certified handover method to safely transition from legacy controllers, ensuring verifiable performance gains at each timescale independently. This mitigates deployment risk in complex, multi-timescale environments.

Key insights

Nested Causal Thompson Sampling (NCTS) offers certified policy optimization for hierarchical causal decision-making under distribution shifts.

Principles

Method

NCTS draws one mechanism-factorised belief per episode and acts recursively. Progressive certified handover allows independent timescale transitions from legacy controllers to NCTS when gains are certifiable.

In practice

Topics

Best for: AI Scientist, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.