Training
Trust the Right Teacher: Quality-Aware Self-Distillation for GUI Grounding
The paper introduces a novel quality-aware self-distillation method for vision-language models (VLMs) aimed at improving GUI grounding tasks. This approach employs soft correctness-aware gating and teacher-probability scaling to enhance the quality of coordinate-token teacher signals, mitigating the degradation caused by inaccuracies in student-generated prefixes. Empirical results demonstrate that this combined strategy leads to consistent performance improvements across six GUI grounding benchmarks, offering practitioners a more reliable framework for training VLMs in coordinate-sensitive applications.
self-distillationGUIgrounding