The Refined Counterfactual Prisoner's Dilemma

· Source: AI Alignment Forum · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Mathematics & Computational Sciences · Depth: Advanced, quick

Summary

The "Refined Counterfactual Prisoner's Dilemma" is a thought experiment designed to illustrate a potential flaw in expected utility maximization, specifically the assumption that agents stop caring about counterfactual worlds after an observation. Inspired by Scott Garrabrant's critique of utility theory, this dilemma posits an omniscient predictor, Omega, who flips a coin and reveals the result. Regardless of the outcome, Omega demands $1. Crucially, Omega also predicts what the agent would have done if the coin had landed the other way. If Omega predicts the agent would not have paid in the counterfactual scenario, it inflicts $1 million in damage. This setup highlights how ignoring counterfactual outcomes can lead to symmetrically burning significant value by refusing a trivial payment, suggesting deeper issues for decision theories that fail under perfect prediction.

Key takeaway

For AI scientists developing decision-making algorithms, you should critically evaluate your models' assumptions regarding counterfactuals and updatelessness. If your agent's decision theory fails when confronted with a perfect predictor like Omega, it likely harbors fundamental issues that could lead to suboptimal or harmful outcomes in complex, real-world scenarios where predictive capabilities are advanced. Ensure your agents account for potential consequences across unobserved states.

Key insights

Ignoring counterfactual outcomes in decision-making can lead to significant, avoidable losses when facing perfect predictors.

Principles

In practice

Topics

Best for: AI Scientist, AI Researcher, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by AI Alignment Forum.