Multimodal
DySink: Dynamic Frame Sinks for Autoregressive Long Video Generation
DySink is a new retrieval-based framework for autoregressive long video generation that addresses the limitations of static early-frame sinks by dynamically selecting visually relevant historical frames. It incorporates a memory bank and a sink anomaly gate to enhance adaptability and prevent collapse in generated content. Experimental results demonstrate that DySink improves dynamic degree and temporal quality on minute-long videos compared to strong baselines, with code and model weights available for practitioners.
videogenerationautoregressive