Coding
Self-Distillation Policy Optimization via Visual Feedback: Bridging Code and Visual Artifacts
The paper introduces Visual-SDPO, a self-distillation policy-optimization framework that leverages visual feedback to enhance code-generated visual artifacts. It employs a Qwen3-VL-8B-Instruct backbone and introduces Visual-Grounded Code Credit Weighting to target supervision spatially, leading to significant improvements of over 10 absolute points in benchmarks like ChartMimic and Design2Code. This approach allows practitioners to generate higher quality visual outputs from code with fewer training steps and no additional inference-time costs, addressing common defects in visual artifacts produced by LLMs.
llmvisualartifacts