AgentsarXiv cs.AI — 7 d ago

AgentCyberRange: Benchmarking Frontier AI Systems in Realistic Cyber Ranges

AgentCyberRange has been introduced as an open, multi-range infrastructure designed to benchmark frontier AI systems' capabilities in realistic cyber attack scenarios. It incorporates 110 vulnerabilities across 15 web applications and 8 enterprise-like environments, allowing for comprehensive evaluation of autonomous cyber attack strategies through two main stages: web exploitation and post-exploitation. The evaluation revealed that GPT-5.5 with Codex achieved a 16.1% success rate in web exploitation tasks and 31.7% in post-exploitation tasks, highlighting the importance of realistic testing environments for understanding AI’s offensive capabilities and identifying emerging vulnerabilities.

cybersecurityAIbenchmarkingrelevance 0.00 · engagement 0.00

Read at source ↗← all news