Safety
The Perceived Fragility of Explanations in Audio Models: Manipulation of Attribution with Unchanged Predictions
This paper presents a psychoacoustic framework that reveals the fragility of post-hoc explanation methods in audio deepfake detection, demonstrating how adversaries can manipulate explanation heatmaps without altering model predictions. The study evaluates various state-of-the-art architectures under strict constraints, using domain-specific perceptual audio quality metrics to assess manipulation costs. This work is significant for practitioners as it highlights vulnerabilities in audio model interpretability, emphasizing the need for robust explanation techniques in AI systems.
explanationaudio-modelsdeepfake-detection