Robust Exploratory Stopping under Ambiguity in Reinforcement Learning
Summary
This paper introduces and analyzes a continuous-time robust reinforcement learning framework designed for optimal stopping problems under ambiguity. The framework addresses situations where an agent must make robust decisions while simultaneously learning about an unknown environment, acknowledging potential errors in learned beliefs. Utilizing the $g$-expectation framework, the authors reformulate the optimal stopping problem under ambiguity as an entropy-regularized optimal control problem with Bernoulli distributed controls to integrate exploration. They derive the optimal Bernoulli distributed control, characterized by backward stochastic differential equations (BSDEs), and establish a policy iteration theorem. This theorem is then implemented as a reinforcement learning algorithm. Numerical experiments, including American put-type and call-type stopping problems, demonstrate the algorithm's convergence and robustness across varying levels of ambiguity and exploration, showing lower relative errors for higher ambiguity levels in out-of-sample performance.
Key takeaway
For AI Scientists and Research Scientists developing robust decision-making systems in uncertain environments, this framework offers a method to integrate exploration and ambiguity-aversion into optimal stopping. You should consider implementing the proposed policy iteration algorithm, particularly using deep splitting methods for policy evaluation, to achieve convergence and enhanced robustness against model misspecification, as demonstrated in financial option pricing examples.
Key insights
A robust RL framework for optimal stopping under ambiguity balances exploration and exploitation using $g$-expectation and BSDEs.
Principles
- Ambiguity can persist despite extensive learning.
- Entropy regularization enables exploration in stopping rules.
- Policy iteration ensures convergence to optimal controls.
Method
Reformulate optimal stopping as an entropy-regularized control problem under $g$-expectation, derive optimal Bernoulli controls via BSDEs, and implement a policy iteration algorithm using deep splitting methods.
In practice
- Apply to American option pricing problems.
- Use deep splitting for policy evaluation.
- Test robustness against dividend rate misspecification.
Topics
- Robust Reinforcement Learning
- Optimal Stopping Problems
- Ambiguity Aversion
- g-expectation
- Backward Stochastic Differential Equations
Code references
Best for: AI Scientist, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.