RAGarXiv cs.AI — 2 d ago

One Token per Multimodal Evidence: Latent Memory for Resource-Constrained QA

The paper introduces Latent Memory, a novel memory paradigm for question answering that replaces raw text and image evidence with a single high-dimensional latent token generated by a compressor LLM/VLM, significantly reducing token consumption in resource-constrained settings. This approach utilizes a unified latent representation space for retrieval and generation, achieving competitive performance on seven text-only and multimodal QA benchmarks while consuming 3x to 10x fewer generator tokens compared to existing retrieval-augmented generation (RAG) methods. The implications for practitioners include improved efficiency in model deployment and resource management in multimodal QA applications.

llmmemoryqamultimodalrelevance 0.00 · engagement 0.00

Read at source ↗← all news