Not all uncertainty is alike: volatility, stochasticity, and exploration
Summary
This paper introduces a novel framework demonstrating that distinct sources of environmental uncertainty—volatility (latent reward states drifting over time) and stochasticity (noisy outcome observations)—have opposing effects on optimal exploration in adaptive decision-making. While both increase posterior uncertainty, volatility enhances exploration by creating information gain, whereas stochasticity suppresses it by degrading observation utility. The authors formally establish this asymmetry by extending the Gittins index framework to Gaussian state-space bandits with latent dynamics. They further derive Cause-Aware Uncertainty-Sensitive Exploration (CAUSE), a closed-form index policy obtained via control-as-inference, which inherits these monotonicities. CAUSE empirically outperforms standard exploration strategies like Thompson sampling and UCB in environments with heterogeneous noise, and matches the optimal Gittins reference in rested bandit settings. The framework also predicts that miscalibrated noise inference can lead to reversed exploration patterns, offering insights into computational accounts of psychiatric conditions.
Key takeaway
For Machine Learning Engineers developing adaptive decision-making agents in dynamic environments, you should differentiate between volatility and stochasticity when designing exploration strategies. Relying solely on total uncertainty (as in UCB or Thompson sampling) can lead to suboptimal exploration, particularly over-exploring noisy arms. Implement CAUSE or similar cause-aware indices to ensure your agents prioritize information gain from environmental drift while minimizing unproductive exploration in highly stochastic settings, potentially improving regret performance in restless bandit problems.
Key insights
Optimal exploration is enhanced by volatility but suppressed by stochasticity, challenging the "more uncertainty, more exploration" paradigm.
Principles
- Volatility creates information gain.
- Stochasticity destroys information gain.
- Optimal exploration depends on uncertainty source.
Method
CAUSE is derived by casting action selection as posterior inference under an optimality constraint within the control-as-inference framework, yielding a closed-form index for restless bandits.
In practice
- Use CAUSE for restless multi-armed bandits.
- Distinguish volatility from observation noise.
- Investigate exploration reversals in psychiatric conditions.
Topics
- Multi-armed Bandits
- Exploration-Exploitation
- Volatility
- Stochasticity
- Control-as-Inference
- Gittins Index
- Computational Psychiatry
Best for: AI Scientist, Machine Learning Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.