The Hitchhiker's Guide to Program Analysis, Part III: Mostly Harmless LLMs
Summary
Evident is a bug analysis system improving static warning triage by separating Large Language Model (LLM) assistance from formal program-behavior reasoning. Developed by researchers, Evident uses an LLM to construct warning-specific analysis harnesses. These are validated before formal analysis by a backend like Frama-C/Eva. This approach ensures no-bug decisions are grounded in formal methods, not LLM judgment. On 200 Android kernel driver warnings, Evident correctly classified 151 cases (76%). It discharged 111 false alarms without dismissing any confirmed bugs. It also rediscovered a vulnerability overlooked by prior LLM-based filtering and manual triage. The system's core implementation is approximately 22.2 KLOC of Python.
Key takeaway
AI Security Engineers or Research Scientists evaluating static analysis warnings should adopt a principled approach. LLMs construct analysis contexts, but formal methods must dictate bug discharge. Do not rely on LLM-generated rationales for "no-bug" verdicts, as this risks overlooking real vulnerabilities. Instead, integrate LLM-generated harnesses with rigorous validation and backend formal analysis to ensure conservative and accurate warning triage. This strategy eliminates false negatives observed in LLM-only filtering.
Key insights
Program behavior decisions must be grounded in formal analysis, with LLMs assisting context construction, not verdicts.
Principles
- LLMs aid bug analysis, but formal methods decide program behavior.
- Discharging a warning requires proving error state is unreachable.
- Validate LLM-generated artifacts before formal analysis.
Method
Evident uses an LLM to construct a warning-specific analysis harness, validates it via admission checks, then a formal backend (e.g., Frama-C/Eva) performs the final reachability check.
In practice
- Use type-preserving abstract-value initialization for inputs.
- Implement validation checks for LLM-generated harnesses.
- Separate LLM context generation from formal analysis verdicts.
Topics
- Program Analysis
- Large Language Models
- Static Analysis
- Bug Detection
- Kernel Drivers
- Formal Methods
- Harness Validation
Code references
Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Scientist, AI Security Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.