Safety
JailbreakOPT: Tool-Assisted Iterative Jailbreak Prompt Optimization
JailbreakOPT is a newly proposed framework for iterative jailbreak prompt optimization in large language models (LLMs), addressing the limitations of existing methods by organizing atomic jailbreak prompts into a tool library. It employs a unified optimization approach and contextual bandit strategies, specifically using Thompson sampling, to enhance prompt efficacy and reduce the number of attempts needed for successful attacks. The framework demonstrated improved attack success rates (ASR) and efficiency across various target LLMs, which is significant for practitioners focused on enhancing security measures against jailbreak vulnerabilities in AI systems.
jailbreakprompt optimizationllm