Online LLM Selection via Constrained Bandits with Time-Varying Demand
Summary
Large Language Models (LLMs) are increasingly deployed in edge-cloud inference systems, presenting challenges in selecting the appropriate model for diverse user tasks with heterogeneous accuracy, latency, and cost profiles. Static selection strategies are insufficient due to model heterogeneity, stochastic performance, and time-varying task demands. Real-world deployments further complicate this with hard resource budgets, like monetary expenditure limits, and soft service-level requirements, such as latency guarantees. This problem is formulated as a constrained stochastic bandit learning task, where a learner sequentially selects models under both packing-type (hard) and covering-type (soft) constraints, adapting to time-varying demand without full knowledge of underlying distributions. A novel online learning algorithm is introduced, leveraging confidence-bound estimates and demand predictions to balance reward maximization with long-term constraint satisfaction. Theoretical guarantees demonstrate sublinear regret and sublinear covering constraint violations against an offline benchmark, with synthetic workload experiments confirming its robustness in dynamic, resource-constrained settings.
Key takeaway
For AI Architects managing LLM deployments in dynamic, resource-constrained edge-cloud environments, you should consider implementing adaptive online selection strategies. Static LLM selection is inadequate for handling time-varying demand and heterogeneous model performance. Adopting a constrained bandit learning approach, which integrates demand predictions and confidence bounds, can help you balance service quality, such as latency guarantees, with hard resource budgets like monetary expenditure limits, ensuring efficient and robust operation.
Key insights
Online LLM selection under dynamic demand and hard/soft constraints can be optimized using constrained bandit learning.
Principles
- LLM selection must adapt to time-varying task demand.
- Constraint satisfaction is critical alongside reward maximization.
Method
A novel online learning algorithm uses confidence-bound estimates and demand predictions to sequentially select LLMs, balancing reward maximization with long-term packing and covering constraint satisfaction.
In practice
- Deploy adaptive LLM selection in edge-cloud systems.
- Integrate demand predictions for dynamic model choices.
Topics
- Large Language Models
- Online Learning
- Constrained Bandits
- Edge-Cloud Inference
- Resource Management
- Demand Prediction
Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, AI Architect
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.