GitHub availability report: May 2026
Summary
GitHub's May 2026 availability report details nine incidents and ongoing infrastructure improvements aimed at enhancing resilience amidst rapid traffic growth, largely driven by AI-assisted workflows. The company is migrating 40% of its monolith traffic to Azure, up from 8% in February, with Git traffic at 30% and repository replication at 99%, effectively doubling capacity in four months. Key structural changes include isolating the primary database cluster, fully cutting over the new users service, and rolling out stateless authentication tokens. Incidents in May included a database schema migration causing 1.3% 5xx errors on May 4, GitHub Actions hosted runner degradations on May 5 and 6 affecting 13.5% of jobs, and a 32-bit integer key overflow halting new pull request review threads on May 6. Other issues involved Copilot session failures, Actions workflow disruptions from a service account suspension, and Copilot degradation due to an upstream API problem affecting GPT-5.2, GPT-5.3-Codex, GPT-5.4, and GPT-5.5 models.
Key takeaway
For AI Architects and MLOps Engineers scaling critical services, GitHub's May 2026 report underscores the necessity of proactive infrastructure resilience. You should prioritize service isolation, robust database management, and comprehensive monitoring to prevent cascading failures. Implement dynamic throttling for migrations and establish allowlists for service accounts to mitigate common incident types. Review your dependency resilience and automated failover strategies, especially for AI models, to ensure continuous availability amidst rapid growth.
Key insights
GitHub prioritizes availability and capacity by re-architecting its infrastructure to support rapid AI-driven traffic growth.
Principles
- Prioritize availability, then capacity, then features.
- Isolate services to prevent cascading failures.
Method
Transform infrastructure by migrating to Azure, breaking monoliths into isolated services, and eliminating shared failure points, including database cluster isolation and stateless authentication tokens.
In practice
- Implement dynamic throttling and circuit breakers for database migrations.
- Add allowlists for critical service accounts to prevent suspension.
- Strengthen monitoring for column size limits and replication lag.
Topics
- GitHub Availability
- Infrastructure Resilience
- Azure Migration
- AI Development Workflows
- Database Management
- GitHub Actions
- GitHub Copilot
Best for: CTO, VP of Engineering/Data, AI Engineer, MLOps Engineer, AI Architect, Director of AI/ML
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by The GitHub Blog.