Safety
From ASR to ASP: Evaluating Prompt Attack Vulnerabilities Against Open-Source LLMs
This paper evaluates the security vulnerabilities of 14 open-source and 3 closed-source Large Language Models (LLMs) against prompt injection attacks, introducing a new metric called Attack Success Probability (ASP) to account for uncertainty in model responses. The study reveals that models like Stablelm2, Mistral, Openchat, and Vicuna are significantly affected, with a hypnotism attack achieving approximately 90% ASP, while ignore prefix attacks exceed 60% ASP across all evaluated open-source models. These findings underscore the critical need for enhanced security measures in LLM applications, particularly in sensitive domains such as finance and healthcare.
prompt attacksopen-source LLMssecurity risks