Multimodal
ARGUS: Stacked Multi-View Identity Mosaic Injection for Subject-Preserving Video Generation
The article introduces Argus, a framework for subject-preserving video generation that employs Stacked Multi-View Identity Mosaic Injection (SMII) to enhance identity representation across various conditions. Argus utilizes a 3x3 stacked mosaic approach to convert identity evidence into a dynamic distribution, improving robustness through techniques like no-cross-pair counterfactual training and Temporal Identity Annealing. Achieving state-of-the-art scores on the OpenS2V-Eval Human-Domain and HardID-Celeb benchmarks, Argus demonstrates significant advancements in handling large viewpoint changes and occlusions, which is crucial for practitioners focused on realistic video synthesis.
video generationmosaic injectionmllm