Research
DeRA-MOS: Optimizing Text-to-Music Evaluation via Decoupled Listwise Ranking and Modality Alignment
DeRA-MOS is a newly proposed decoupled optimization framework for evaluating text-to-music (TTM) systems, addressing limitations in traditional mean opinion score (MOS) estimators by utilizing a batch-aware listwise ranking loss for music impression (MI) and a score-anchored modality alignment loss for text alignment (TA). The framework enhances rank-based metrics and cross-modal coherence, achieving significant improvements in MI and TA ranking metrics on the MusicEval benchmark. This advancement is crucial for practitioners in TTM systems, offering a more reliable evaluation method that aligns better with human judgment.
text-to-musicevaluationllm