Online LLM Selection via Constrained Bandits with Time-Varying Demand

2026-06-16 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cloud Computing & IT Infrastructure · Depth: Expert, quick

Summary

Large Language Models (LLMs) are increasingly deployed in edge-cloud inference systems, presenting challenges in selecting the appropriate model for diverse user tasks with heterogeneous accuracy, latency, and cost profiles. Static selection strategies are insufficient due to model heterogeneity, stochastic performance, and time-varying task demands. Real-world deployments further complicate this with hard resource budgets, like monetary expenditure limits, and soft service-level requirements, such as latency guarantees. This problem is formulated as a constrained stochastic bandit learning task, where a learner sequentially selects models under both packing-type (hard) and covering-type (soft) constraints, adapting to time-varying demand without full knowledge of underlying distributions. A novel online learning algorithm is introduced, leveraging confidence-bound estimates and demand predictions to balance reward maximization with long-term constraint satisfaction. Theoretical guarantees demonstrate sublinear regret and sublinear covering constraint violations against an offline benchmark, with synthetic workload experiments confirming its robustness in dynamic, resource-constrained settings.

Key takeaway

For AI Architects managing LLM deployments in dynamic, resource-constrained edge-cloud environments, you should consider implementing adaptive online selection strategies. Static LLM selection is inadequate for handling time-varying demand and heterogeneous model performance. Adopting a constrained bandit learning approach, which integrates demand predictions and confidence bounds, can help you balance service quality, such as latency guarantees, with hard resource budgets like monetary expenditure limits, ensuring efficient and robust operation.

Key insights

Online LLM selection under dynamic demand and hard/soft constraints can be optimized using constrained bandit learning.

Principles

LLM selection must adapt to time-varying task demand.
Constraint satisfaction is critical alongside reward maximization.

Method

A novel online learning algorithm uses confidence-bound estimates and demand predictions to sequentially select LLMs, balancing reward maximization with long-term packing and covering constraint satisfaction.

In practice

Deploy adaptive LLM selection in edge-cloud systems.
Integrate demand predictions for dynamic model choices.

Topics

Large Language Models
Online Learning
Constrained Bandits
Edge-Cloud Inference
Resource Management
Demand Prediction

Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, AI Architect

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.