ResearcharXiv cs.AI — 9 d ago

Distilling Drifting Transformers with Representation Autoencoders

This article presents Drift-RAE, a new method for distilling pretrained flow models using Representation Autoencoders (RAEs) in a drifting paradigm to enhance training stability and performance. The authors demonstrate that Drift-RAE achieves a Fréchet Inception Distance (FID) of 1.77 on the ImageNet 256 dataset with only 10,000 distillation steps, outperforming existing RAE distillation techniques and showing competitive results with the original Drifting Model. This work is significant for practitioners as it addresses convergence issues in distillation processes and provides a more stable framework for leveraging rich semantic representations in RAEs.

distillationtransformersrelevance 0.00 · engagement 0.00

Read at source ↗← all news