Multimodal
Steady-Forcing: Balancing Spatial Persistence and Motion Continuity in Long-Horizon Nature Video Diffusion
The paper introduces Steady-Forcing, a novel framework for long-horizon nature video generation that addresses the stability-motion trade-off in autoregressive video diffusion models. Key components include a persistent visual anchor (V-Sink), an exponential moving-average motion memory (EMA-Sink), and task-focused distillation from a Wan2.1-14B teacher, which collectively enhance background consistency and fluid dynamics over extended rollouts. This work is significant for practitioners as it provides a structured approach to improving the quality of fixed-camera video generation, highlighting the need for task-specific benchmarks to evaluate static-camera artifacts effectively.
video generationdiffusionnature