Multimodal
BiWM: Advancing Open-Source Interactive Video World Models with Bidirectional Autoregression
BiWM is introduced as the first full-stack framework for interactive video world models utilizing a bidirectional autoregressive approach, significantly reducing the training pipeline from four stages to two, thus enhancing generation quality and inference speed. It supports models ranging from Wan2.1-1.3B to LTX-2.3-22B, and integrates features like camera control fine-tuning, pluggable history compression, and an optional 4-bit NVFP4 training/inference pipeline. This framework is crucial for practitioners as it allows for improved controllability and fidelity in video generation, addressing limitations of existing causal models like minWM.
videoautoregressivemodels