Agents
Online LLM Selection via Constrained Bandits with Time-Varying Demand
The article presents a novel approach to selecting Large Language Models (LLMs) in edge-cloud inference systems using a constrained stochastic bandit framework. The proposed algorithm addresses the challenges of model heterogeneity and time-varying task demands while adhering to hard and soft resource constraints, achieving sublinear regret and constraint violations in dynamic environments. This work is significant for practitioners as it offers a method to optimize LLM selection in real-time applications while managing resource limitations effectively.
LLMbanditsonline selection