InferencearXiv cs.CL — 12 d ago

MemBoost: A Memory-Boosted Framework for Cost-Aware LLM Inference

MemBoost is a new framework designed to optimize inference costs for Large Language Models (LLMs) by enabling answer reuse and efficient retrieval of supporting information. The framework incorporates a lightweight model that can escalate challenging queries to a more powerful model when necessary, facilitating continual memory growth and cost-aware routing. Experimental results demonstrate that MemBoost significantly lowers the need for expensive large-model invocations while preserving answer quality, making it a valuable tool for practitioners aiming to enhance the efficiency of LLM deployments.

cost-awarellmmemory-boostedrelevance 0.00 · engagement 0.00

Read at source ↗← all news