WildCode Revisited: A Comprehensive Empirical Study on the Security of LLM-Generated Code
Summary
A comprehensive empirical study, "WildCode Revisited," analyzed the security of code generated by ChatGPT using 82,843 real-world conversations extracted from the WildChat dataset (April 2023-May 2024). The research confirms that LLM-generated code frequently exhibits significant security vulnerabilities. Key findings include a 20.61% vulnerability rate for weak hash functions, 3.93% for SQL injection, and 14.85% of C/C++ programs containing memory safety issues. Notably, all 30 Java deserialization instances examined were vulnerable, and approximately one-third of regular expressions were susceptible to ReDoS attacks. The study also revealed that 14.4% of Python modules and 3.5% of JavaScript packages generated were "hallucinated" (non-existent). Furthermore, user intent analysis showed that "Secure Coding" was rarely prioritized in queries, even when users encountered buggy code, indicating a significant gap in security awareness.
Key takeaway
For software engineers integrating LLM-generated code, you must assume inherent security risks. Proactively scan all AI-produced code with static analysis tools like OpenGrep for vulnerabilities such as weak hashes, SQL injection, and memory safety issues. Do not rely on LLMs to self-correct or users to prompt for security. Explicitly request secure coding practices and verify all package dependencies to mitigate risks from hallucinated modules.
Key insights
LLM-generated code, particularly from ChatGPT, consistently contains significant security vulnerabilities, largely unaddressed by users.
Principles
- LLM-generated code often lacks explicit security features.
- User queries rarely prioritize security concerns.
- Real-world interaction data reveals distinct LLM biases.
Method
The study constructed a dataset from 82,843 real ChatGPT conversations containing code, then used OpenGrep with 648 rules for security analysis and zero-shot classification for user intent.
In practice
- Analyze LLM-generated code with static analysis tools.
- Explicitly prompt LLMs for secure coding practices.
- Validate all generated package imports against repositories.
Topics
- LLM Code Security
- ChatGPT
- Static Code Analysis
- Software Vulnerabilities
- User Prompting
- Hallucinated Modules
Code references
Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Security Engineer, AI Scientist, Software Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.