RAGarXiv cs.CL — 16 d ago

CacheWeaver: Cache-Aware Evidence Ordering for Efficient Grounded RAG Inference

CacheWeaver is a new method designed to enhance Retrieval-Augmented Generation (RAG) by implementing cache-aware evidence ordering to optimize prompt efficiency. It utilizes a prefix tree to prioritize the most reusable prefixes during evidence retrieval, achieving a 20-33% reduction in median time-to-first-token (TTFT) across various vLLM configurations while maintaining answer quality. This approach is significant for practitioners as it provides a lightweight solution to improve inference speed without altering the underlying serving engine or evidence sets.

RAGcacheinferencerelevance 0.00 · engagement 0.00

Read at source ↗← all news