Safety
AutoDojo: Adaptive Attacks Expose Superficial Defenses and User-Underspecification Limits in LLM Agents
The article introduces AutoDojo, an adaptive framework that extends the static AgentDojo benchmark for evaluating defenses against indirect prompt injection (IPI) attacks on LLM-powered agents. AutoDojo demonstrates that existing defenses are often ineffective, achieving significantly higher attack success rates (ASR) through adaptive strategies, particularly on action-open tasks where user specifications are vague. This highlights the need for more robust evaluation methods and defenses in LLM applications, as traditional benchmarks fail to account for dynamic attack strategies.
securityLLMadaptive attacks