ai-digest.dev
last updated 13 h ago
InferencearXiv cs.AI 4 d ago

RKSC: Reasoning-Aware KV Cache Sharing and Confident Early Exit for Multi-Step LLM Inference

RKSC (Reasoning-Aware KV Cache Sharing) is a training-free inference framework designed to optimize multi-branch LLM reasoning pipelines by eliminating structural redundancies. It employs ASKS for efficient KV cache sharing based on hidden-state cosine similarity, CGEE for confidence-gated early exits during inference, and RSBCM to manage cache growth effectively, achieving an average speedup of 3.008x over No-KV baselines across five model families (7B-10B) and various benchmarks. This framework allows practitioners to enhance inference efficiency without requiring model fine-tuning or architectural modifications, making it a valuable tool for optimizing LLM deployments.

KV cacheLLMmulti-steprelevance 0.00 · engagement 0.00
Read at source ↗← all news
RKSC: Reasoning-Aware KV Cache Sharing and Confident Early Exit for Multi-Step LLM Inference — AI News Digest