Training
When Do We Need LLMs? A Diagnostic for Language-Driven Bandits
The paper introduces LLMP-UCB, a bandit algorithm that leverages Large Language Models (LLMs) to derive uncertainty estimates for Contextual Multi-Armed Bandits (CMABs) in non-episodic decision-making contexts. Experiments reveal that lightweight numerical bandits using text embeddings can achieve comparable or superior accuracy to LLM-based approaches while significantly reducing computational costs. The study also presents a geometric diagnostic tool to help practitioners determine the appropriateness of LLM-driven reasoning versus simpler numerical methods, facilitating cost-effective and uncertainty-aware decision-making in AI applications.
banditsllmuncertaintydecision-making