Multimodal
Detail++: Training-Free Detail Enhancer for Text-to-Image Diffusion Models
Detail++ is a training-free framework designed to enhance text-to-image diffusion models by introducing a Progressive Detail Injection (PDI) strategy, which decomposes complex prompts into simplified sub-prompts for staged generation. The method leverages self-attention for global composition and employs cross-attention mechanisms along with a Centroid Alignment Loss to improve attribute consistency and reduce binding noise. Extensive experiments show that Detail++ outperforms existing methods on T2I-CompBench and a new style composition benchmark, making it particularly valuable for practitioners dealing with complex multi-object scenarios in T2I generation.
text-to-imagediffusion models