Multi-Armed Bandits with Arriving Arms: Sequential Screening, Dynamic Regret, and Sublinear Guarantees

2026-06-08 · Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Mathematics & Computational Sciences · Depth: Expert, quick

Summary

A new study introduces UCB for Arriving Arms (UCB-AA), an algorithm designed to address the multi-armed bandit problem where the set of available arms expands over time. This dynamic scenario, common in sequential experimentation, renders traditional regret metrics against a single best arm in hindsight inappropriate. UCB-AA evaluates performance relative to the best arm currently available, employing a dynamic-regret criterion. The algorithm features an elimination-based procedure with a preliminary screening step for newly arrived arms before they fully compete with incumbent arms. UCB-AA achieves regret bounds explicitly dependent on the arrival process, demonstrates sublinear dynamic regret under specific gap evolution conditions, and supports an online extension for unknown horizons. Simulation results indicate that UCB-AA effectively reduces wasted pulls and maintains a smaller active arm set while preserving competitive regret performance.

Key takeaway

For AI Scientists designing sequential experimentation or dynamic optimization systems, consider UCB-AA to manage arriving arms effectively. Its preliminary screening and elimination-based approach reduce wasted pulls and maintain competitive regret, offering a robust solution for environments where new actions become available over time. This method provides sublinear dynamic regret guarantees, making it suitable for scenarios with evolving optimal choices.

Key insights

UCB-AA addresses dynamic regret in multi-armed bandits where new arms arrive sequentially, improving efficiency.

Principles

Traditional regret metrics are inappropriate for arriving-arm environments.
Dynamic regret evaluates performance relative to the best currently available arm.
Preliminary screening of new arms improves efficiency and reduces wasted pulls.

Method

UCB-AA is an elimination-based procedure with an aiding preliminary screening step for newly arrived arms before full competition with incumbent arms.

In practice

Reduce wasted pulls in sequential experimentation.
Maintain smaller active arm sets in dynamic environments.

Topics

Multi-Armed Bandits
Dynamic Regret
Sequential Experimentation
UCB-AA
Machine Learning
Algorithm Design

Best for: Research Scientist, AI Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.