Best of both worlds: Stochastic & adversarial best-arm identification
Summary
A new study submitted to arXiv on April 16, 2026, investigates best-arm identification in bandit problems with arbitrary and potentially adversarial rewards. While a simple uniform random learner achieves optimal error rates in adversarial scenarios, it performs suboptimally with stochastic rewards. The research explores the feasibility of designing a single learner that performs optimally in both stochastic and adversarial reward environments without prior knowledge of the reward nature. The authors demonstrate that such a universal learner is generally impossible, establishing that robustness to adversarial rewards limits optimal error rates to a subset of stochastic problems. They provide a lower bound characterizing the optimal rate under this constraint and introduce a simple, parameter-free algorithm that matches this lower bound (up to log factors) in stochastic problems while maintaining robustness to adversarial rewards.
Key takeaway
For research scientists developing bandit algorithms, you should recognize the inherent trade-off between optimal performance in stochastic versus adversarial reward settings. Your design choices must prioritize either robustness or stochastic optimality, as a single universally optimal learner is not feasible. Consider implementing the proposed parameter-free algorithm when facing environments with unknown or mixed reward characteristics to achieve a "best of both worlds" performance profile.
Key insights
Optimal best-arm identification across both stochastic and adversarial bandit rewards is generally impossible.
Principles
- Adversarial robustness limits stochastic optimality.
- Uniform random learners are suboptimal for stochastic rewards.
Method
The paper designs a parameter-free algorithm that achieves near-optimal error rates in stochastic bandit problems, matching a derived lower bound, while also being robust to adversarial rewards.
In practice
- Consider reward nature when selecting bandit strategies.
- Use the proposed algorithm for mixed reward environments.
Topics
- Best-Arm Identification
- Bandit Algorithms
- Stochastic Rewards
- Adversarial Rewards
- Optimal Rates
Best for: Research Scientist, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.