Research
SD-GRPO: Verifiable Segment Decomposition for Long-Form Vision-Language Generation
The paper introduces Segment-Decomposed GRPO (SD-GRPO), an advancement in Group Relative Policy Optimization tailored for long-form vision-language (VL) generation. By utilizing z-normalized per-segment rewards instead of a single scalar advantage, SD-GRPO demonstrates significant improvements in performance across various tasks, including multi-panel dense-captioning and scientific figure captioning, particularly in scenarios with high semantic entanglement. This approach enhances credit assignment in VL tasks, which is crucial for practitioners aiming to develop more accurate and context-aware multimodal models.
vlgenerationoptimization