Training
Principled RL for Flow Matching Emerges from the Chunk-level Policy Optimization
The paper introduces Group Chunking Policy Optimization (GCPO), a novel chunk-level reinforcement learning approach designed to enhance post-training flow matching for text-to-image (T2I) generation. By aggregating consecutive steps into coherent 'chunks', GCPO addresses the limitations of inaccurate advantage attribution seen in Group Relative Policy Optimization (GRPO), yielding up to 43% relative performance improvements on standard T2I benchmarks and preference alignment. This advancement is significant for practitioners as it offers a more effective optimization strategy that could lead to better model performance in T2I applications.
reinforcement learningpolicy optimization