RAG
AudioDER: A Deduplication-Enhanced Reasoning Dataset for Post-Training Large Audio-Language Models
AudioDER, a new reasoning-oriented dataset for post-training Large Audio-Language Models (LALMs), has been introduced to enhance audio reasoning capabilities by addressing redundancy in existing datasets. It features approximately 191,000 samples comprising audio clips, multiple-choice questions, answer candidates, captions, and chain-of-thought rationales, leveraging a deduplication pipeline for improved corpus diversity. Experimental results demonstrate that post-training on AudioDER significantly enhances the performance of the Qwen2-Audio-7B-Instruct model across various audio reasoning benchmarks, indicating its potential as a critical resource for advancing audio reasoning in LALMs.
audio-language modelsdatasetreasoning