Research
Inside the Latent Flow: Causal Deciphering of Attention Dynamics in Audio Separation Foundation Models
The paper presents a novel probing protocol for SAM Audio, revealing a dual-pathway text-conditioning mechanism that enhances audio separation in flow-matching transformers. It introduces Layer-Selective Attention Caching (LSAC), a method that reduces self-attention computation by approximately 25% while maintaining audio quality, achieving up to 6.7x higher quality retention compared to naive step reduction. This advancement is significant for practitioners as it offers a training-free optimization approach to improve efficiency in audio processing tasks without sacrificing performance.
audio separationtransformersattention dynamics