Multimodal
AnchorEdit: Maintaining Temporal Consistency in Multi-turn Image Editing via Causal Memory
AnchorEdit is a novel autoregressive diffusion-based framework for high-resolution, multi-turn image editing, addressing issues of identity drift and error accumulation in iterative design. It utilizes a three-stage training process that includes identity-preserving pretraining, causal autoregressive fine-tuning with a self-rollout strategy, and consistency distillation, enabling efficient generation across multiple editing steps. The introduction of a memory mechanism during inference ensures stable subject identity across long editing sequences, with AnchorEdit achieving state-of-the-art performance on a new benchmark specifically designed for long-horizon stability in multi-turn editing tasks.
image editingcausal memoryml