Safety
PI-Hunter: Automated Red-Teaming for Exposing and Localizing Prompt Injections
PI-Hunter is an automated auditing framework designed to expose and localize prompt injections in large language model (LLM) agents. It creates realistic test cases that evolve through feedback-driven exploration, effectively revealing latent malicious instructions within external environments. Extensive experiments show that PI-Hunter significantly enhances vulnerability exposure and attack-surface coverage compared to existing red-teaming methods, making it a crucial tool for developers to identify and mitigate security risks in LLM applications.
red-teamingpromptinjectionllm