Research
Benchmarking and Exploring the Capabilities of LLMs for Attack Investigations
The paper introduces AuditBench, a benchmark dataset designed for evaluating LLMs in the context of security-related system audit log investigations, encompassing over 50 scenarios of both benign and malicious activities. It assesses the performance of five leading LLMs across four common log-investigation tasks, highlighting how model size, data representation, and prompt construction influence outcomes and error profiles. This work is significant for practitioners as it provides a structured framework for assessing LLM capabilities in security operations and identifies areas for improvement in future model development.
llmaudit logsbenchmarking