Safety
Evaluating Prompting-Based Defenses Against Domain-Camouflaged Injection Attacks
This study evaluates five prompting-based defenses against domain-camouflaged injection attacks across three model families: Claude Haiku, Llama 3.1 (8B), and Gemini 2.0 Flash. The findings indicate that paraphrasing retrieved content is the most effective defense, reducing attack success rates by 55-84%, while spotlighting shows varying effectiveness depending on the model. This research highlights the model-dependent nature of defense strategies and underscores the need for tailored solutions in high-risk domains such as finance, where baseline attack success rates remain significant.
defenseinjection attacksprompting