A single algorithm for both restless and rested rotting bandits

2026-04-24 · Source: stat.ML updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

A new algorithm, Rotting Adaptive Window UCB (RAW-UCB), addresses the challenge of "rotting bandits" in applications like recommender systems and intelligent tutoring. These systems experience decreasing rewards over time, either due to past actions (rested rotting) or external factors (restless rotting). Previously, these two settings were considered distinct, with algorithms performing poorly when misapplied. RAW-UCB achieves near-optimal regret in both rested and restless rotting bandit scenarios without requiring prior knowledge of the setting or the type of non-stationarity (e.g., piece-wise constant, bounded variation). This contrasts with earlier findings that suggested no single algorithm could perform well if rewards were allowed to increase. The theoretical efficacy of RAW-UCB has been validated through synthetic and dataset-based experiments, as presented at AISTATS 2020.

Key takeaway

For AI Scientists developing adaptive systems where action rewards decay over time, RAW-UCB offers a robust solution. You can deploy this algorithm without needing to pre-classify whether the decay is due to user fatigue or external obsolescence, simplifying model design and improving performance across varied non-stationary environments. This eliminates the need for separate algorithms for rested versus restless rotting bandits.

Key insights

RAW-UCB unifies restless and rested rotting bandit problems, achieving near-optimal regret without prior setting knowledge.

Principles

Rewards can decay due to internal or external factors.
A single algorithm can address diverse rotting bandit settings.

Method

RAW-UCB is a novel algorithm designed to adaptively handle both rested and restless rotting bandit problems, achieving near-optimal regret without requiring explicit knowledge of the decay mechanism.

In practice

Apply RAW-UCB in recommender systems.
Utilize RAW-UCB for intelligent tutoring systems.

Topics

Rotting Bandits
Rested Bandits
Restless Bandits
RAW-UCB Algorithm
Regret Minimization

Best for: AI Scientist, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.