Multimodal
CMI-RewardBench: Evaluating Music Reward Models with Compositional Multimodal Instruction
The paper introduces CMI-RewardBench, a benchmark for evaluating music reward models using Compositional Multimodal Instruction (CMI), which integrates text, lyrics, and audio prompts. It presents two datasets: CMI-Pref-Pseudo with 110k pseudo-labeled samples and CMI-Pref, a human-annotated dataset for fine-grained tasks. The CMI reward models (CMI-RMs) demonstrate strong correlation with human judgments on musicality and alignment, and offer effective inference-time scaling through top-k filtering, providing valuable tools for practitioners in music generation and evaluation.
musicreward modelsmultimodal