Calibration Without Comprehension: Diagnosing the Limits of Fine-Tuning LLMs for Vulnerability Detection in Systems Software
Summary
The CWE-Trace framework evaluates Large Language Models' (LLMs) ability to detect vulnerabilities in systems software, specifically the Linux kernel. Utilizing 834 manually curated Linux kernel samples spanning 74 Common Weakness Enumerations (CWEs) with a strict temporal split (pre-2025/post-cutoff), the study assessed eight vanilla LLMs and 15 LoRA fine-tuned variants. Key findings reveal that data contamination offers no measurable advantage, with 84% of nominally contaminated samples lacking usable memorization signals due to issues like absent functions or 31% CWE misclassification. Furthermore, LLMs exhibit stable, systematic failure modes (DFI ranging from -85.5 to +94.8 pp) that persist despite fine-tuning, indicating "calibration without comprehension" where output thresholds shift without improving underlying security reasoning. The best detection score achieved was only 52.1% (+2.1 pp above chance), and exact CWE ranking remained below 1.3% Top-1 accuracy, confirming current LLMs lack reliable security reasoning for systems software.
Key takeaway
For AI Security Engineers evaluating LLMs for vulnerability detection, you should recognize that current fine-tuning strategies primarily shift output thresholds rather than instilling genuine security reasoning. Your model selection must account for inherent backbone directional biases (e.g., paranoid or skeptical) that persist despite training. Focus on acquiring higher-fidelity training data that precisely pairs CVEs with root-cause functions, and explore advanced fine-tuning methods designed to correct these deep-seated priors, as raw accuracy alone is insufficient to gauge reliability.
Key insights
LLMs fine-tuned for vulnerability detection calibrate outputs without genuine security comprehension, showing stable, systematic failure modes.
Principles
- Backbone directional priors dominate fine-tuning.
- Data contamination provides no measurable advantage.
- Detection and CWE understanding are decoupled capabilities.
Method
CWE-Trace uses a temporal split, context-aware vulnerable–patched pairs, and diagnostic metrics (DFI, HDD) to evaluate LLMs on non-targeted, targeted detection, and CWE classification.
In practice
- Prioritize high-fidelity, root-cause function training data.
- Use DFI to diagnose LLM decision biases (paranoid/skeptical).
Topics
- LLM Vulnerability Detection
- Fine-Tuning LLMs
- CWE-Trace Framework
- Data Contamination
- Linux Kernel Security
- Diagnostic Metrics (DFI, HDD)
Best for: Research Scientist, CTO, VP of Engineering/Data, AI Scientist, Machine Learning Engineer, AI Security Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.