SafetyarXiv cs.AI — 4 d ago

Grammar-Constrained Decoding Can Jailbreak LLMs into Generating Malicious Code

The paper introduces a new jailbreak attack, CodeSpear, which exploits Grammar-Constrained Decoding (GCD) in Large Language Models (LLMs) to generate malicious code, highlighting a significant security risk associated with this reliability-enhancing technique. To counter this vulnerability, the authors propose CodeShield, a safety alignment method that teaches LLMs to produce semantically harmless honeypot code while maintaining natural-language refusals. Experiments demonstrate that CodeSpear increases the attack success rate by over 30% compared to existing jailbreak methods, underscoring the need for enhanced security measures in LLM applications.

LLMmalicious codegrammarrelevance 0.00 · engagement 0.00

Read at source ↗← all news