Replicable Bandits with UCB based Exploration
Summary
A new study introduces replicable algorithms for stochastic multi-armed bandits (MAB) and linear bandits, focusing on Upper Confidence Bound (UCB) based exploration. The research defines a bandit algorithm as ρ-replicable if two executions, sharing internal randomness but having independent reward realizations, produce identical action sequences with a probability of at least 1-ρ. Unlike prior elimination-based methods that often rely on discretization and lead to suboptimal dependence on dimension d and ρ, this work develops optimistic alternatives. For MAB, the authors propose RepUCB, a batched UCB algorithm achieving a regret of O(K^2log^2 T/ρ^2 ∑_{a:Δ_a>0}(Δ_a+log(KTlogT)/Δ_a)). For linear bandits, they introduce RepRidge, a replicable ridge regression estimator with confidence and ρ-replicability guarantees, which is then used in RepLinUCB. RepLinUCB demonstrates a regret bounded by Õ((d+d^3/ρ)√T), improving previous guarantees by a factor of O(d/ρ).
Key takeaway
For research scientists developing or deploying bandit algorithms, this work demonstrates that optimistic UCB-based approaches can significantly enhance replicability while improving regret bounds. You should consider integrating RepUCB for multi-armed bandit problems and RepLinUCB, leveraging the RepRidge estimator, for linear bandit scenarios to achieve more robust and consistent experimental outcomes, especially where reproducibility is critical.
Key insights
New UCB-based algorithms improve replicability and regret bounds for stochastic multi-armed and linear bandits.
Principles
- Replicability requires consistent action sequences.
- Optimistic algorithms can reduce replicability costs.
Method
The RepUCB algorithm uses batched UCB for MAB. RepLinUCB employs RepRidge, a novel replicable ridge regression estimator, for stochastic linear bandits.
In practice
- Apply RepUCB for replicable MAB solutions.
- Utilize RepLinUCB for improved linear bandit performance.
Topics
- Replicable Bandits
- UCB Exploration
- Stochastic Multi-Armed Bandits
- Stochastic Linear Bandits
- RepUCB Algorithm
Best for: Research Scientist, AI Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.LG updates on arXiv.org.