Models
LiveStarPro: Proactive Streaming Video Understanding with Hierarchical Memory for Long-Horizon Streams
LiveStarPro is a new live streaming assistant designed for proactive video understanding, addressing challenges in processing continuous streams and maintaining long-horizon contextual memory. It features three key components: Streaming Verification Decoding (SVeD) for response timing, Streaming Causal Attention Masks (SCAM) for video-language alignment, and Tree-Structured Hierarchical Memory (TSHM) for efficient retrieval of historical data. LiveStarPro demonstrates significant improvements over existing methods, with a 28.9% increase in semantic correctness and an 18.2% reduction in timing error, along with a 1.58x speedup in inference, making it a valuable tool for practitioners working with long-duration video data.
video understandinglong-horizon streamsvideo-llms