CodingarXiv cs.AI — 9 d ago

Snyk VulnBench JS 1.0: Can LLMs Find the Same Bugs Twice?

Snyk VulnBench JS 1.0 presents findings from 300 repeated vulnerability scans using agentic large language models (LLMs) on JavaScript code. The study revealed uneven repeatability in LLM security findings, with only 22 out of 161 unique unmatched findings appearing consistently across five runs, while reference-matched findings from Claude showed greater stability. These results emphasize the importance of integrating LLMs with deterministic static application security testing (SAST) for effective vulnerability detection, highlighting the complementary strengths of both approaches.

llmsecurityvulnerabilityrelevance 0.00 · engagement 0.00

Read at source ↗← all news