SafetyarXiv cs.AI — 15 d ago

Analyzing Defensive Misdirection Against Model-Guided Automated Attacks on Agentic AI Systems

The paper presents a probabilistic model addressing the vulnerabilities of agentic AI systems to automated attacks, particularly focusing on prompt-injection and jailbreak methods. It introduces a novel defense strategy called Contextual Misdirection via Progressive Engagement (CMPE), which replaces predictable refusal responses with misleading information to confuse attackers. This approach significantly lowers the attacker success rate (ASR) on jailbreak benchmarks, demonstrating a reduction of up to two orders of magnitude and effectively neutralizing verified attacks in specific test scenarios.

agentic-aiattacksdefenserelevance 0.00 · engagement 0.00

Read at source ↗← all news