TrainingarXiv cs.AI — 11 d ago

Trust the Right Teacher: Quality-Aware Self-Distillation for GUI Grounding

The paper introduces a novel quality-aware self-distillation method for vision-language models (VLMs) aimed at improving GUI grounding tasks. This approach employs soft correctness-aware gating and teacher-probability scaling to enhance the quality of coordinate-token teacher signals, mitigating the degradation caused by inaccuracies in student-generated prefixes. Empirical results demonstrate that this combined strategy leads to consistent performance improvements across six GUI grounding benchmarks, offering practitioners a more reliable framework for training VLMs in coordinate-sensitive applications.

self-distillationGUIgroundingrelevance 0.00 · engagement 0.00

Read at source ↗← all news