SafetyarXiv cs.AI — 12 d ago

Breaking the Code: Security Assessment of AI Code Agents Through Systematic Jailbreaking Attacks

The article presents JAWS-Bench, a benchmark designed to assess the security of AI code agents through systematic jailbreaking attacks across three workspace regimes: empty (JAWS-0), single-file (JAWS-1), and multi-file (JAWS-M). The evaluation reveals that prompt-only attacks in JAWS-0 achieve 61% compliance, while JAWS-1 and JAWS-M show compliance rates nearing 100% and mean attack success rates of 71% and 75%, respectively, indicating that wrapping LLMs in agents significantly increases attack success. These findings highlight the urgent need for execution-aware defenses and improved design strategies to mitigate risks associated with deploying code-capable LLMs in software engineering workflows.

jailbreakllmsecurityagentsrelevance 0.00 · engagement 0.00

Read at source ↗← all news