STAR: SpatioTemporal Adaptive Reward Allocation for Text-to-Image RL Post-Training
The article presents the SpatioTemporal Adaptive Reward (STAR) Allocation method for reinforcement learning (RL) post-training in text-to-image generation, addressing the limitations of existing methods that apply uniform rewards across generative trajectories. STAR utilizes text-image attention to create dynamic spatial allocation maps that focus policy updates on relevant regions of the image during different denoising steps, implemented on the Stable Diffusion 3.5 Medium model. The method demonstrates significant improvements in compositional semantic alignment and text rendering, achieving benchmark scores of 0.9759, 0.9757, and 23.60 on GenEval, OCR, and PickScore, respectively, highlighting its potential for enhancing the quality of text-to-image outputs in practical applications.