Agents
X-OPD: Cross-Modal On-Policy Distillation for Capability Alignment in Speech LLMs
The article presents X-OPD, a Cross-Modal On-Policy Distillation framework aimed at aligning the capabilities of Speech LLMs with those of text-based models. This approach utilizes on-policy rollouts for the Speech LLM, allowing a text-based teacher model to provide token-level feedback, thereby enhancing performance on complex tasks. The method shows significant improvements in benchmarks, addressing the performance degradation seen in end-to-end speech models compared to their text-based counterparts, which is crucial for practitioners looking to optimize speech model capabilities.
speechllmdistillation