Replicable Bandits with UCB based Exploration

· Source: cs.LG updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

A new study introduces replicable algorithms for stochastic multi-armed bandits (MAB) and linear bandits, focusing on Upper Confidence Bound (UCB) based exploration. The research defines a bandit algorithm as ρ-replicable if two executions, sharing internal randomness but having independent reward realizations, produce identical action sequences with a probability of at least 1-ρ. Unlike prior elimination-based methods that often rely on discretization and lead to suboptimal dependence on dimension d and ρ, this work develops optimistic alternatives. For MAB, the authors propose RepUCB, a batched UCB algorithm achieving a regret of O(K^2log^2 T/ρ^2 ∑_{a:Δ_a>0}(Δ_a+log(KTlogT)/Δ_a)). For linear bandits, they introduce RepRidge, a replicable ridge regression estimator with confidence and ρ-replicability guarantees, which is then used in RepLinUCB. RepLinUCB demonstrates a regret bounded by Õ((d+d^3/ρ)√T), improving previous guarantees by a factor of O(d/ρ).

Key takeaway

For research scientists developing or deploying bandit algorithms, this work demonstrates that optimistic UCB-based approaches can significantly enhance replicability while improving regret bounds. You should consider integrating RepUCB for multi-armed bandit problems and RepLinUCB, leveraging the RepRidge estimator, for linear bandit scenarios to achieve more robust and consistent experimental outcomes, especially where reproducibility is critical.

Key insights

New UCB-based algorithms improve replicability and regret bounds for stochastic multi-armed and linear bandits.

Principles

Method

The RepUCB algorithm uses batched UCB for MAB. RepLinUCB employs RepRidge, a novel replicable ridge regression estimator, for stochastic linear bandits.

In practice

Topics

Best for: Research Scientist, AI Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.LG updates on arXiv.org.