Multi-Armed Bandits with Arriving Arms: Sequential Screening, Dynamic Regret, and Sublinear Guarantees
Summary
A new study introduces UCB for Arriving Arms (UCB-AA), an algorithm designed to address the multi-armed bandit problem where the set of available arms expands over time. This dynamic scenario, common in sequential experimentation, renders traditional regret metrics against a single best arm in hindsight inappropriate. UCB-AA evaluates performance relative to the best arm currently available, employing a dynamic-regret criterion. The algorithm features an elimination-based procedure with a preliminary screening step for newly arrived arms before they fully compete with incumbent arms. UCB-AA achieves regret bounds explicitly dependent on the arrival process, demonstrates sublinear dynamic regret under specific gap evolution conditions, and supports an online extension for unknown horizons. Simulation results indicate that UCB-AA effectively reduces wasted pulls and maintains a smaller active arm set while preserving competitive regret performance.
Key takeaway
For AI Scientists designing sequential experimentation or dynamic optimization systems, consider UCB-AA to manage arriving arms effectively. Its preliminary screening and elimination-based approach reduce wasted pulls and maintain competitive regret, offering a robust solution for environments where new actions become available over time. This method provides sublinear dynamic regret guarantees, making it suitable for scenarios with evolving optimal choices.
Key insights
UCB-AA addresses dynamic regret in multi-armed bandits where new arms arrive sequentially, improving efficiency.
Principles
- Traditional regret metrics are inappropriate for arriving-arm environments.
- Dynamic regret evaluates performance relative to the best currently available arm.
- Preliminary screening of new arms improves efficiency and reduces wasted pulls.
Method
UCB-AA is an elimination-based procedure with an aiding preliminary screening step for newly arrived arms before full competition with incumbent arms.
In practice
- Reduce wasted pulls in sequential experimentation.
- Maintain smaller active arm sets in dynamic environments.
Topics
- Multi-Armed Bandits
- Dynamic Regret
- Sequential Experimentation
- UCB-AA
- Machine Learning
- Algorithm Design
Best for: Research Scientist, AI Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.