An AI coding bot took down Amazon Web Services

· Source: AI - Ars Technica · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Cloud Computing & IT Infrastructure · Depth: Intermediate, quick

Summary

Amazon Web Services (AWS) experienced two service disruptions attributed to "user error" involving AI coding tools, not the AI itself. The December incident, affecting a single service in mainland China, and an earlier event, which did not impact customer-facing services, were linked to engineers using AWS's Kiro and Amazon Q Developer. In both cases, the engineers had elevated permissions and bypassed standard peer review processes, which Amazon typically requires. Following the December incident, AWS implemented new safeguards, including mandatory peer review and staff training, while continuing to promote AI tool adoption, targeting 80% weekly usage among developers despite some employee skepticism regarding utility and error risk. These incidents were less severe than a 15-hour AWS outage in October 2025 that affected services like OpenAI's ChatGPT.

Key takeaway

For engineering leaders evaluating AI coding assistant deployment, ensure robust access controls and mandatory human oversight are in place. Your teams should not grant AI tools or their operators elevated permissions that bypass standard review processes, as this significantly increases operational risk. Prioritize implementing safeguards like mandatory peer review and comprehensive staff training to mitigate potential disruptions and build confidence in AI-assisted development workflows.

Key insights

AI tool-related outages stemmed from user error and access control issues, not inherent AI flaws.

Principles

In practice

Topics

Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Engineer, MLOps Engineer, Software Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by AI - Ars Technica.