Safety
Dynamics of Adversarial Attacks on Large Language Model-Based Search Engines
The paper presents a theoretical analysis of adversarial attacks on Large Language Model-based search engines, specifically focusing on ranking manipulation attacks. It models the interactions between attackers and defenders as an Infinitely Repeated Prisoners' Dilemma, identifying key factors such as attack costs and success rates that influence strategic behavior. The findings underscore the need for adaptive security strategies, as traditional defensive measures may inadvertently incentivize further attacks, highlighting the complexities in securing LLM-based systems for practitioners.
adversarial attacksllmsearch engines