Robust Exploratory Stopping under Ambiguity in Reinforcement Learning

2026-04-17 · Source: stat.ML updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Mathematics & Computational Sciences · Depth: Expert, extended

Summary

This paper introduces and analyzes a continuous-time robust reinforcement learning framework designed for optimal stopping problems under ambiguity. The framework addresses situations where an agent must make robust decisions while simultaneously learning about an unknown environment, acknowledging potential errors in learned beliefs. Utilizing the $g$-expectation framework, the authors reformulate the optimal stopping problem under ambiguity as an entropy-regularized optimal control problem with Bernoulli distributed controls to integrate exploration. They derive the optimal Bernoulli distributed control, characterized by backward stochastic differential equations (BSDEs), and establish a policy iteration theorem. This theorem is then implemented as a reinforcement learning algorithm. Numerical experiments, including American put-type and call-type stopping problems, demonstrate the algorithm's convergence and robustness across varying levels of ambiguity and exploration, showing lower relative errors for higher ambiguity levels in out-of-sample performance.

Key takeaway

For AI Scientists and Research Scientists developing robust decision-making systems in uncertain environments, this framework offers a method to integrate exploration and ambiguity-aversion into optimal stopping. You should consider implementing the proposed policy iteration algorithm, particularly using deep splitting methods for policy evaluation, to achieve convergence and enhanced robustness against model misspecification, as demonstrated in financial option pricing examples.

Key insights

A robust RL framework for optimal stopping under ambiguity balances exploration and exploitation using $g$-expectation and BSDEs.

Principles

Ambiguity can persist despite extensive learning.
Entropy regularization enables exploration in stopping rules.
Policy iteration ensures convergence to optimal controls.

Method

Reformulate optimal stopping as an entropy-regularized control problem under $g$-expectation, derive optimal Bernoulli controls via BSDEs, and implement a policy iteration algorithm using deep splitting methods.

In practice

Apply to American option pricing problems.
Use deep splitting for policy evaluation.
Test robustness against dividend rate misspecification.

Topics

Robust Reinforcement Learning
Optimal Stopping Problems
Ambiguity Aversion
g-expectation
Backward Stochastic Differential Equations

Code references

GEOR-TS/Exploratory_Robust_Stopping_RL

Best for: AI Scientist, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.