RAG
Reroute, Don't Remove: Recoverable Visual Token Routing for Vision-Language Models
The paper introduces "Reroute," a training-free plugin for vision-language models (VLMs) that enhances visual token management by replacing traditional rank-and-remove strategies with recoverable routing. This approach allows selected visual tokens to pass through decoder blocks while deferring others, enabling them to re-enter the candidate pool in subsequent routing decisions. Experimental results on LLaVA-1.5 and Qwen backbones demonstrate that Reroute improves grounding performance during aggressive token reduction without sacrificing overall visual question answering (VQA) capabilities, suggesting a shift in how token reduction is approached in VLMs.
visual tokensroutingvision-language models