Replicable Bandits with UCB based Exploration

2026-04-23 · Source: cs.LG updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

A new study introduces replicable algorithms for stochastic multi-armed bandits (MAB) and linear bandits, focusing on Upper Confidence Bound (UCB) based exploration. The research defines a bandit algorithm as ρ-replicable if two executions, sharing internal randomness but having independent reward realizations, produce identical action sequences with a probability of at least 1-ρ. Unlike prior elimination-based methods that often rely on discretization and lead to suboptimal dependence on dimension d and ρ, this work develops optimistic alternatives. For MAB, the authors propose RepUCB, a batched UCB algorithm achieving a regret of O(K^2log^2 T/ρ^2 ∑_{a:Δ_a>0}(Δ_a+log(KTlogT)/Δ_a)). For linear bandits, they introduce RepRidge, a replicable ridge regression estimator with confidence and ρ-replicability guarantees, which is then used in RepLinUCB. RepLinUCB demonstrates a regret bounded by Õ((d+d^3/ρ)√T), improving previous guarantees by a factor of O(d/ρ).

Key takeaway

For research scientists developing or deploying bandit algorithms, this work demonstrates that optimistic UCB-based approaches can significantly enhance replicability while improving regret bounds. You should consider integrating RepUCB for multi-armed bandit problems and RepLinUCB, leveraging the RepRidge estimator, for linear bandit scenarios to achieve more robust and consistent experimental outcomes, especially where reproducibility is critical.

Key insights

New UCB-based algorithms improve replicability and regret bounds for stochastic multi-armed and linear bandits.

Principles

Replicability requires consistent action sequences.
Optimistic algorithms can reduce replicability costs.

Method

The RepUCB algorithm uses batched UCB for MAB. RepLinUCB employs RepRidge, a novel replicable ridge regression estimator, for stochastic linear bandits.

In practice

Apply RepUCB for replicable MAB solutions.
Utilize RepLinUCB for improved linear bandit performance.

Topics

Replicable Bandits
UCB Exploration
Stochastic Multi-Armed Bandits
Stochastic Linear Bandits
RepUCB Algorithm

Best for: Research Scientist, AI Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.LG updates on arXiv.org.