TrainingarXiv cs.AI — 12 d ago

Principled RL for Flow Matching Emerges from the Chunk-level Policy Optimization

The paper introduces Group Chunking Policy Optimization (GCPO), a novel chunk-level reinforcement learning approach designed to enhance post-training flow matching for text-to-image (T2I) generation. By aggregating consecutive steps into coherent 'chunks', GCPO addresses the limitations of inaccurate advantage attribution seen in Group Relative Policy Optimization (GRPO), yielding up to 43% relative performance improvements on standard T2I benchmarks and preference alignment. This advancement is significant for practitioners as it offers a more effective optimization strategy that could lead to better model performance in T2I applications.

reinforcement learningpolicy optimizationrelevance 0.00 · engagement 0.00

Read at source ↗← all news