Agents
The Emergence of Autonomous Penetration Capabilities in Large Language Model-Powered AI Systems
The article presents a new evaluation framework for autonomous penetration capabilities in LLM-powered AI systems, addressing limitations in existing methodologies. The framework includes two tiers of target server environments and employs a general-purpose agent architecture with cybersecurity tools, assessing 19 open-weight and proprietary LLMs. Results indicate penetration success rates between 10.7% and 69.3%, highlighting that advancements in LLM capabilities correlate with improved autonomous penetration performance, which is critical for understanding AI's role in cybersecurity.
autonomous AIcybersecurityLLMpenetration testing