Research
Rethinking Cross-Layer Information Routing in Diffusion Transformers
The paper introduces Diffusion-Adaptive Routing (DAR), a novel approach to cross-layer information routing in Diffusion Transformers (DiTs) that replaces traditional residual connections with a learnable, timestep-adaptive aggregation method. Empirical analysis reveals issues with conventional residual addition, and DAR demonstrates improvements on ImageNet 256x256, enhancing the SiT-XL/2 model's FID score by 2.11 while requiring 8.75 times fewer training iterations. This method not only accelerates training but also maintains high-frequency detail during fine-tuning, highlighting a significant opportunity for optimization in diffusion model architectures.
transformersinformation-routing