Inference
TetherCache: Stabilizing Autoregressive Long-Form Video Generation with Gated Recall and Trusted Alignment
The paper introduces TetherCache, a cache management strategy designed to enhance autoregressive long-form video generation by addressing issues of context distribution shift and visual artifacts. TetherCache employs GRAB (Gated Recall with Attention-Diversity Balancing) to select diverse historical frames and TAME (Trusted Alignment via Memory Editing) to align recalled memory tokens with a trusted context, significantly improving video quality metrics on the VBench-Long benchmark, particularly reducing quality drift from 7.84 to 1.33 for 240-second videos. This advancement is critical for practitioners aiming to develop stable and high-quality long-duration video generation models.
video generationautoregressive models