ResearcharXiv cs.AI — 12 d ago

Models Take Notes at Prefill: KV Cache Can Be Editable and Composable

The paper introduces a novel approach to prefix caching in language models, allowing for editable and composable key/value (KV) caches. This method enables selective editing of cached notes without invalidating the entire cache, achieving decision consistency with significant latency reductions—up to 14.9x faster than traditional recompute methods—while maintaining high cache hit rates (98.5%). This advancement is applicable across various model architectures and can enhance efficiency in online applications, making it particularly relevant for practitioners optimizing LLM performance.

cachingeditablecomposablellmnotesrelevance 0.00 · engagement 0.00

Read at source ↗← all news