Research
Adversarial Attacks Leverage Interference Between Features in Superposition
The paper presents a novel explanation for adversarial vulnerability in neural networks, attributing it to the phenomenon of superposition, where networks represent more concepts than their dimensions allow, leading to interference between representations. It demonstrates that this interference can create predictable adversarial attacks, with findings indicating that perturbations discovered via Projected Gradient Descent (PGD) align with optimal perturbations based on interference geometry. This research is significant for practitioners as it provides a deeper understanding of adversarial attacks, suggesting that model training and data representation strategies may need to account for interference patterns to enhance robustness.
adversarialvulnerabilityneural networks