AI is getting scary good at finding hidden software bugs - even in decades-old code
Summary
Microsoft Azure CTO Mark Russinovich demonstrated that Anthropic's Claude Opus 4.6 AI model can perform a "security audit" on 1986 Apple II 6502 assembly code, identifying subtle logic errors, such as a missed carry flag check, that had remained dormant for decades. This capability highlights AI's proficiency in reasoning about low-level control flow and CPU flags, even for obscure, legacy architectures. While beneficial for finding long-standing bugs, experts like Matthew Trifiro and Adedeji Olowe warn that this also expands the attack surface, enabling bad actors to systematically find and exploit vulnerabilities in unpatchable legacy systems. Large Language Models (LLMs) like GPT-4.1 and Mistral Large are proving as effective as traditional static analysis tools in bug detection, complementing existing solutions like SpotBugs and CodeQL. Companies such as Anthropic and Black Duck are already integrating LLMs into security analysis, with Mozilla reporting that Anthropic's Frontier Red Team found more high-severity bugs in Firefox in two weeks than humans typically find in two months. However, studies show AI-generated code introduces 1.7 times more bugs than human-written code, including critical and major issues, and can flood open-source projects with bogus security reports, indicating AI is not yet a standalone replacement for human programmers or security professionals.
Key takeaway
For CTOs and VPs of Engineering managing extensive legacy systems, you should recognize that AI models like Claude Opus 4.6 can uncover decades-old vulnerabilities in unmaintained code. While this offers a powerful new tool for security audits, it simultaneously creates a significant risk by enabling malicious actors to exploit these same systems. Prioritize assessing your exposure to unpatchable firmware and consider strategic replacements for critical, vulnerable devices, as AI-driven attacks on these systems are becoming increasingly feasible.
Key insights
AI excels at finding obscure bugs in legacy code but also expands the attack surface for unpatchable systems.
Principles
- AI complements traditional static analysis tools.
- AI-generated code introduces more bugs than human code.
Method
LLMs analyze code by identifying failure modes and attack paths, rather than just rule violations, effectively performing a security audit on low-level control flow and CPU flags.
In practice
- Use LLMs to audit legacy codebases for hidden vulnerabilities.
- Integrate LLM-powered plugins with reverse-engineering tools like Ghidra.
Topics
- AI-assisted Security Audits
- Large Language Models
- Software Vulnerability Detection
- Legacy Code Security
- Static Code Analysis
Code references
Best for: CTO, VP of Engineering/Data, AI Engineer, Software Engineer, Security Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by News and Advice on the World's Latest Innovations | ZDNET.