Cost-Ordered Feasibility for Multi-Armed Bandits with Cost Subsidy
Summary
This paper introduces Cost-Ordered Feasibility (COF), a novel algorithm designed to solve the Multi-Armed Bandit with Cost-Subsidy (MAB-CS) problem. MAB-CS aims to minimize cumulative cost while ensuring a minimum reward quality, defined as a fraction $(1-alpha)$ of the unknown best reward $mu^*$. The authors derive new instance-dependent lower bounds on sub-optimal samples, generalizing prior work. COF leverages these insights by evaluating arms in increasing order of cost, using confidence bounds to determine feasibility or infeasibility. It incorporates "combining samples" to aggregate confidence from multiple arms and "exclusive sampling" to prioritize under-sampled candidate arms. Empirical validation on MovieLens and Goodreads datasets, along with synthetic instances, demonstrates COF's superior performance in minimizing both cumulative cost and quality regret compared to baselines like PE-CS and ETC-CS, particularly avoiding the regret plateau seen in some prior methods.
Key takeaway
For Research Scientists developing online decision-making systems with cost and quality constraints, COF offers a robust approach. You should consider implementing COF, especially in applications like LLM routing or recommendation systems, as its instance-adaptive exploration and sample aggregation features demonstrably outperform prior methods in minimizing cumulative cost and quality regret, avoiding performance plateaus seen in alternatives.
Key insights
COF algorithm minimizes MAB-CS cost and quality regret by cost-ordered evaluation and sample aggregation.
Principles
- Minimize cost subject to a relative reward constraint.
- Combine evidence from multiple arms for robust decision-making.
- Prioritize cheaper arms for feasibility evaluation.
Method
COF evaluates arms by increasing cost, using UCB/LCB to assess feasibility against $(1-alpha)mu^*$. It aggregates confidence from multiple arms and exclusively samples under-represented candidates to optimize regret.
In practice
- Apply COF for LLM routing to balance quality and cost.
- Use COF in recommendation systems for genre selection.
- Consider cost-biased sampling for uneven cost distributions.
Topics
- Multi-Armed Bandits with Cost-Subsidy
- Cost-Ordered Feasibility Algorithm
- Instance-Dependent Lower Bounds
- Regret Minimization
- Confidence Bound Schemes
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Data Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.