Inference
MemBoost: A Memory-Boosted Framework for Cost-Aware LLM Inference
MemBoost is a new framework designed to optimize inference costs for Large Language Models (LLMs) by enabling answer reuse and efficient retrieval of supporting information. The framework incorporates a lightweight model that can escalate challenging queries to a more powerful model when necessary, facilitating continual memory growth and cost-aware routing. Experimental results demonstrate that MemBoost significantly lowers the need for expensive large-model invocations while preserving answer quality, making it a valuable tool for practitioners aiming to enhance the efficiency of LLM deployments.
cost-awarellmmemory-boosted