RAGarXiv cs.AI — 8 d ago

Reroute, Don't Remove: Recoverable Visual Token Routing for Vision-Language Models

The paper introduces "Reroute," a training-free plugin for vision-language models (VLMs) that enhances visual token management by replacing traditional rank-and-remove strategies with recoverable routing. This approach allows selected visual tokens to pass through decoder blocks while deferring others, enabling them to re-enter the candidate pool in subsequent routing decisions. Experimental results on LLaVA-1.5 and Qwen backbones demonstrate that Reroute improves grounding performance during aggressive token reduction without sacrificing overall visual question answering (VQA) capabilities, suggesting a shift in how token reduction is approached in VLMs.

visual tokensroutingvision-language modelsrelevance 0.00 · engagement 0.00

Read at source ↗← all news