Best of both worlds: Stochastic & adversarial best-arm identification

2026-04-17 · Source: stat.ML updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

A new study submitted to arXiv on April 16, 2026, investigates best-arm identification in bandit problems with arbitrary and potentially adversarial rewards. While a simple uniform random learner achieves optimal error rates in adversarial scenarios, it performs suboptimally with stochastic rewards. The research explores the feasibility of designing a single learner that performs optimally in both stochastic and adversarial reward environments without prior knowledge of the reward nature. The authors demonstrate that such a universal learner is generally impossible, establishing that robustness to adversarial rewards limits optimal error rates to a subset of stochastic problems. They provide a lower bound characterizing the optimal rate under this constraint and introduce a simple, parameter-free algorithm that matches this lower bound (up to log factors) in stochastic problems while maintaining robustness to adversarial rewards.

Key takeaway

For research scientists developing bandit algorithms, you should recognize the inherent trade-off between optimal performance in stochastic versus adversarial reward settings. Your design choices must prioritize either robustness or stochastic optimality, as a single universally optimal learner is not feasible. Consider implementing the proposed parameter-free algorithm when facing environments with unknown or mixed reward characteristics to achieve a "best of both worlds" performance profile.

Key insights

Optimal best-arm identification across both stochastic and adversarial bandit rewards is generally impossible.

Principles

Adversarial robustness limits stochastic optimality.
Uniform random learners are suboptimal for stochastic rewards.

Method

The paper designs a parameter-free algorithm that achieves near-optimal error rates in stochastic bandit problems, matching a derived lower bound, while also being robust to adversarial rewards.

In practice

Consider reward nature when selecting bandit strategies.
Use the proposed algorithm for mixed reward environments.

Topics

Best-Arm Identification
Bandit Algorithms
Stochastic Rewards
Adversarial Rewards
Optimal Rates

Best for: Research Scientist, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.