A single algorithm for both restless and rested rotting bandits
Summary
A new algorithm, Rotting Adaptive Window UCB (RAW-UCB), addresses the challenge of "rotting bandits" in applications like recommender systems and intelligent tutoring. These systems experience decreasing rewards over time, either due to past actions (rested rotting) or external factors (restless rotting). Previously, these two settings were considered distinct, with algorithms performing poorly when misapplied. RAW-UCB achieves near-optimal regret in both rested and restless rotting bandit scenarios without requiring prior knowledge of the setting or the type of non-stationarity (e.g., piece-wise constant, bounded variation). This contrasts with earlier findings that suggested no single algorithm could perform well if rewards were allowed to increase. The theoretical efficacy of RAW-UCB has been validated through synthetic and dataset-based experiments, as presented at AISTATS 2020.
Key takeaway
For AI Scientists developing adaptive systems where action rewards decay over time, RAW-UCB offers a robust solution. You can deploy this algorithm without needing to pre-classify whether the decay is due to user fatigue or external obsolescence, simplifying model design and improving performance across varied non-stationary environments. This eliminates the need for separate algorithms for rested versus restless rotting bandits.
Key insights
RAW-UCB unifies restless and rested rotting bandit problems, achieving near-optimal regret without prior setting knowledge.
Principles
- Rewards can decay due to internal or external factors.
- A single algorithm can address diverse rotting bandit settings.
Method
RAW-UCB is a novel algorithm designed to adaptively handle both rested and restless rotting bandit problems, achieving near-optimal regret without requiring explicit knowledge of the decay mechanism.
In practice
- Apply RAW-UCB in recommender systems.
- Utilize RAW-UCB for intelligent tutoring systems.
Topics
- Rotting Bandits
- Rested Bandits
- Restless Bandits
- RAW-UCB Algorithm
- Regret Minimization
Best for: AI Scientist, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.