SafetyarXiv cs.AI — 9 d ago

GAS-Leak-LLM: Genetic Algorithm-Based Suffix Optimization for Black-Box LLM Jailbreaking

The article introduces GAS-Leak-LLM, a novel jailbreaking attack utilizing a genetic algorithm to evolve adversarial suffixes that circumvent safety constraints of black-box large language models (LLMs). This method operates without access to model parameters, employing selection, mutation, and crossover techniques to explore prompt space and identify effective adversarial inputs. The findings highlight vulnerabilities in current safety mechanisms, emphasizing the need for improved defenses against such adversarial manipulations in deployed AI systems.

llmjailbreakingadversarial-attacksrelevance 0.00 · engagement 0.00

Read at source ↗← all news