MultimodalarXiv cs.AI — 9 d ago

Spectro-Temporal Interference Confounds Phase Encoding in Spatial Audio Foundation Models

The article presents a psychoacoustic benchmark for evaluating spatial audio models based on the binaural masking level difference (BMLD). It assesses nine frozen audio models, including binaural and monaural self-supervised learning (SSL) models, revealing that dedicated binaural SSL models demonstrate effective phase encoding, while general-purpose models rely on spectro-temporal interference. This research highlights the importance of accurately encoding phase information for spatial audio applications, which is critical for practitioners developing localization systems in audio processing.

audiospatial-modelsself-supervisedrelevance 0.00 · engagement 0.00

Read at source ↗← all news