TrainingarXiv cs.AI — 9 d ago

Policy Regret for Embedding Model Routing: Contextual Bandits with Low-Rank Experts

The paper presents a novel approach to embedding model routing in recommendation systems by framing it as an adversarial contextual linear bandit problem with low-rank experts. It introduces the Hypentropy Policy Gradient (HPG) algorithm, which achieves a linearized policy regret of $\tilde{\mathcal O}(s\sqrt{M T})$, effectively managing the challenges of incomplete information and structural misspecification. This work is significant for practitioners as it offers a structured and efficient method for dynamically routing queries to multiple models, enhancing the performance of recommendation systems in realistic scenarios.

embeddingbanditsonline learningrelevance 0.00 · engagement 0.00

Read at source ↗← all news