AI is getting scary good at finding hidden software bugs - even in decades-old code

· Source: News and Advice on the World's Latest Innovations | ZDNET · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Cybersecurity & Data Privacy · Depth: Intermediate, short

Summary

Microsoft Azure CTO Mark Russinovich demonstrated that Anthropic's Claude Opus 4.6 AI model can perform a "security audit" on 1986 Apple II 6502 assembly code, identifying subtle logic errors, such as a missed carry flag check, that had remained dormant for decades. This capability highlights AI's proficiency in reasoning about low-level control flow and CPU flags, even for obscure, legacy architectures. While beneficial for finding long-standing bugs, experts like Matthew Trifiro and Adedeji Olowe warn that this also expands the attack surface, enabling bad actors to systematically find and exploit vulnerabilities in unpatchable legacy systems. Large Language Models (LLMs) like GPT-4.1 and Mistral Large are proving as effective as traditional static analysis tools in bug detection, complementing existing solutions like SpotBugs and CodeQL. Companies such as Anthropic and Black Duck are already integrating LLMs into security analysis, with Mozilla reporting that Anthropic's Frontier Red Team found more high-severity bugs in Firefox in two weeks than humans typically find in two months. However, studies show AI-generated code introduces 1.7 times more bugs than human-written code, including critical and major issues, and can flood open-source projects with bogus security reports, indicating AI is not yet a standalone replacement for human programmers or security professionals.

Key takeaway

For CTOs and VPs of Engineering managing extensive legacy systems, you should recognize that AI models like Claude Opus 4.6 can uncover decades-old vulnerabilities in unmaintained code. While this offers a powerful new tool for security audits, it simultaneously creates a significant risk by enabling malicious actors to exploit these same systems. Prioritize assessing your exposure to unpatchable firmware and consider strategic replacements for critical, vulnerable devices, as AI-driven attacks on these systems are becoming increasingly feasible.

Key insights

AI excels at finding obscure bugs in legacy code but also expands the attack surface for unpatchable systems.

Principles

Method

LLMs analyze code by identifying failure modes and attack paths, rather than just rule violations, effectively performing a security audit on low-level control flow and CPU flags.

In practice

Topics

Code references

Best for: CTO, VP of Engineering/Data, AI Engineer, Software Engineer, Security Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by News and Advice on the World's Latest Innovations | ZDNET.