Multimodal
What Should a Streaming Video Model Remember?
The paper introduces SelectStream, a selective latent-memory framework designed for streaming video understanding, which optimally allocates memory to enhance query responses without diluting current-scene perception. It employs three mechanisms: surprise-driven adaptive windowing, priority-preserving consolidation, and query-conditioned graph reasoning, allowing it to maintain a fixed-capacity latent memory graph. Experimental results demonstrate SelectStream's superior performance, achieving 82.67% on StreamingBench and outperforming existing methods, which is significant for practitioners focused on efficient memory management in real-time video processing tasks.
videostreamingmemorymodel