TrainingarXiv cs.AI — 15 d ago

Calibration Without Comprehension: Diagnosing the Limits of Fine-Tuning LLMs for Vulnerability Detection in Systems Software

The paper introduces CWE-Trace, a framework for assessing the vulnerability detection capabilities of LLMs using 834 curated Linux kernel samples across 74 CWEs. It evaluates eight vanilla LLMs and 15 LoRA fine-tuned models, revealing that data contamination does not enhance performance and that models exhibit persistent failure modes with a maximum detection score of only 52.1%. This indicates that fine-tuning does not improve the models' underlying security reasoning, highlighting a critical gap in LLMs' ability to understand vulnerabilities in systems software.

fine-tuningvulnerability detectionllmcwe-tracerelevance 0.00 · engagement 0.00

Read at source ↗← all news