Cost-Ordered Feasibility for Multi-Armed Bandits with Cost Subsidy

2026-05-11 · Source: stat.ML updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, extended

Summary

This paper introduces Cost-Ordered Feasibility (COF), a novel algorithm designed to solve the Multi-Armed Bandit with Cost-Subsidy (MAB-CS) problem. MAB-CS aims to minimize cumulative cost while ensuring a minimum reward quality, defined as a fraction $(1-alpha)$ of the unknown best reward $mu^*$. The authors derive new instance-dependent lower bounds on sub-optimal samples, generalizing prior work. COF leverages these insights by evaluating arms in increasing order of cost, using confidence bounds to determine feasibility or infeasibility. It incorporates "combining samples" to aggregate confidence from multiple arms and "exclusive sampling" to prioritize under-sampled candidate arms. Empirical validation on MovieLens and Goodreads datasets, along with synthetic instances, demonstrates COF's superior performance in minimizing both cumulative cost and quality regret compared to baselines like PE-CS and ETC-CS, particularly avoiding the regret plateau seen in some prior methods.

Key takeaway

For Research Scientists developing online decision-making systems with cost and quality constraints, COF offers a robust approach. You should consider implementing COF, especially in applications like LLM routing or recommendation systems, as its instance-adaptive exploration and sample aggregation features demonstrably outperform prior methods in minimizing cumulative cost and quality regret, avoiding performance plateaus seen in alternatives.

Key insights

COF algorithm minimizes MAB-CS cost and quality regret by cost-ordered evaluation and sample aggregation.

Principles

Minimize cost subject to a relative reward constraint.
Combine evidence from multiple arms for robust decision-making.
Prioritize cheaper arms for feasibility evaluation.

Method

COF evaluates arms by increasing cost, using UCB/LCB to assess feasibility against $(1-alpha)mu^*$. It aggregates confidence from multiple arms and exclusively samples under-represented candidates to optimize regret.

In practice

Apply COF for LLM routing to balance quality and cost.
Use COF in recommendation systems for genre selection.
Consider cost-biased sampling for uneven cost distributions.

Topics

Multi-Armed Bandits with Cost-Subsidy
Cost-Ordered Feasibility Algorithm
Instance-Dependent Lower Bounds
Regret Minimization
Confidence Bound Schemes

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Data Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.