ResearcharXiv cs.CL — 2 d ago

Benchmarking and Exploring the Capabilities of LLMs for Attack Investigations

The paper introduces AuditBench, a benchmark dataset designed for evaluating LLMs in the context of security-related system audit log investigations, encompassing over 50 scenarios of both benign and malicious activities. It assesses the performance of five leading LLMs across four common log-investigation tasks, highlighting how model size, data representation, and prompt construction influence outcomes and error profiles. This work is significant for practitioners as it provides a structured framework for assessing LLM capabilities in security operations and identifies areas for improvement in future model development.

llmaudit logsbenchmarkingrelevance 0.00 · engagement 0.00

Read at source ↗← all news