Safety
FragFuse: Bypassing Access Control of Large Language Model Agents via Memory-Based Query Fragmentation and Fusion
The paper introduces FragFuse, a novel attack method that enables unprivileged users to bypass access control mechanisms in large language model (LLM) agents by exploiting long-term memory operations. FragFuse operates in three stages: identifying fragments of prohibited content, injecting these fragments into memory, and retrieving them through a follow-up query, achieving an average bypass success rate of 86.3% across various agent settings. This work highlights significant vulnerabilities in current access control systems, indicating that existing defenses, such as prompt-injection and perplexity detectors, are insufficient against such memory-based attacks, which is critical for practitioners developing secure LLM applications.
llmaccess controlsecurity