Training
Calibration Without Comprehension: Diagnosing the Limits of Fine-Tuning LLMs for Vulnerability Detection in Systems Software
The paper introduces CWE-Trace, a framework for assessing the vulnerability detection capabilities of LLMs using 834 curated Linux kernel samples across 74 CWEs. It evaluates eight vanilla LLMs and 15 LoRA fine-tuned models, revealing that data contamination does not enhance performance and that models exhibit persistent failure modes with a maximum detection score of only 52.1%. This indicates that fine-tuning does not improve the models' underlying security reasoning, highlighting a critical gap in LLMs' ability to understand vulnerabilities in systems software.
fine-tuningvulnerability detectionllmcwe-trace