Prospect-Theory Behavior from Bellman Optimality in MDPs with Catastrophic States
Summary
A study on risk-neutral control in Markov Decision Processes (MDPs) with an absorbing catastrophic state reveals that standard Bellman optimality inherently generates prospect-theory-like behaviors. Despite linear rewards and no agent utility curvature, the model exhibits an S-shaped value-function profile, an endogenous loss-sensitivity coefficient λ*(S) > 1, and a reflection-effect policy reversal. Across 495 configurations, the optimal policy consistently plays safe near catastrophe in positive-drift scenarios, even when risky actions offer higher immediate expected value. Conversely, it plays risky near catastrophe in negative-drift regimes, despite safer actions having lower immediate expected loss. A closed-form expression for the asymptotic loss-aversion plateau λ̄ was derived, matching numerical solutions with R^2 = 0.999. These phenomena persist under tabular Q-learning, achieving 0.98 correlation in growth and 1.00 in decline, and under various stochastic noise conditions up to 50% of the step size.
Key takeaway
For AI Scientists designing optimal control systems in environments with potential catastrophic failures, you should recognize that standard Bellman optimality inherently produces prospect-theory-like risk behaviors. This implies your risk-neutral agents may exhibit S-shaped value functions and policy reversals near failure states, even without explicit utility functions. Account for this endogenous loss-sensitivity when modeling agent behavior and designing robust control strategies, especially in high-stakes applications.
Key insights
Bellman optimality in MDPs with absorbing catastrophic states intrinsically generates prospect-theory-like risk behaviors.
Principles
- Absorbing failure states are a sufficient mechanism for prospect-theory.
- Optimal policies reverse based on system drift (growth vs. decline).
- Endogenous loss-sensitivity λ*(S) > 1 emerges from optimal control.
In practice
- Design control policies considering S-shaped value functions.
- Anticipate policy reversals near catastrophe in MDPs.
- Incorporate endogenous loss-sensitivity in risk-neutral models.
Topics
- Markov Decision Processes
- Bellman Optimality
- Prospect Theory
- Risk-Neutral Control
- Catastrophic States
- Q-learning
Best for: AI Scientist, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.