Research
DifFRACT: Diffusion Feature Reconstruction and Attribution for Circuit Tracing
The paper introduces DifFRACT, a novel approach for circuit tracing in multimodal diffusion transformers, enhancing mechanistic interpretability by employing timestep-conditioned transcoders to analyze the behavior of MLP sublayers in models like FLUX. This method allows for exact feature-to-feature attribution and reveals interpretable circuits that clarify how semantic information propagates across denoising steps, outperforming traditional sparse autoencoders in terms of sparsity and faithfulness. This advancement is significant for practitioners as it provides a more precise framework for understanding and controlling multimodal generative models, addressing systematic generation errors effectively.
interpretabilityneural-networkscausal-analysis