Prospect-Theory Behavior from Bellman Optimality in MDPs with Catastrophic States

2026-05-31 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Mathematics & Computational Sciences · Depth: Expert, quick

Summary

A study on risk-neutral control in Markov Decision Processes (MDPs) with an absorbing catastrophic state reveals that standard Bellman optimality inherently generates prospect-theory-like behaviors. Despite linear rewards and no agent utility curvature, the model exhibits an S-shaped value-function profile, an endogenous loss-sensitivity coefficient λ*(S) > 1, and a reflection-effect policy reversal. Across 495 configurations, the optimal policy consistently plays safe near catastrophe in positive-drift scenarios, even when risky actions offer higher immediate expected value. Conversely, it plays risky near catastrophe in negative-drift regimes, despite safer actions having lower immediate expected loss. A closed-form expression for the asymptotic loss-aversion plateau λ̄ was derived, matching numerical solutions with R^2 = 0.999. These phenomena persist under tabular Q-learning, achieving 0.98 correlation in growth and 1.00 in decline, and under various stochastic noise conditions up to 50% of the step size.

Key takeaway

For AI Scientists designing optimal control systems in environments with potential catastrophic failures, you should recognize that standard Bellman optimality inherently produces prospect-theory-like risk behaviors. This implies your risk-neutral agents may exhibit S-shaped value functions and policy reversals near failure states, even without explicit utility functions. Account for this endogenous loss-sensitivity when modeling agent behavior and designing robust control strategies, especially in high-stakes applications.

Key insights

Bellman optimality in MDPs with absorbing catastrophic states intrinsically generates prospect-theory-like risk behaviors.

Principles

Absorbing failure states are a sufficient mechanism for prospect-theory.
Optimal policies reverse based on system drift (growth vs. decline).
Endogenous loss-sensitivity λ*(S) > 1 emerges from optimal control.

In practice

Design control policies considering S-shaped value functions.
Anticipate policy reversals near catastrophe in MDPs.
Incorporate endogenous loss-sensitivity in risk-neutral models.

Topics

Markov Decision Processes
Bellman Optimality
Prospect Theory
Risk-Neutral Control
Catastrophic States
Q-learning

Best for: AI Scientist, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.