GitHub outage caused by AI-driven traffic surge and configuration error

On February 9 of this year, late at night Beijing time, tens of millions of developers around the world opened GitHub and saw the same page.

Not a 404—something more anxiety-inducing: that yellow warning bar that sends chills down every engineer’s spine, paired with a row of status indicators turning from green to red.

github.com is down.

The API is down.

GitHub Actions is down.

Git operations have failed—even Copilot couldn't escape it.

That night, someone’s CI/CD pipeline stalled at its most critical stage, someone’s automated deployment hung mid-process, and someone waited for a PR that refused to merge—behind it all was a feature waiting to go live, waiting for real users.

GitHub later released an incident report. The root cause, in technical terms, was "an overload of the core database cluster responsible for authentication and user management." But behind these few words lies a shocking chain of triggering events—

Two days ago, the engineering team changed the refresh time for the "user settings cache" from 12 hours to 2 hours to quickly roll out a new model to users. It was just this one configuration value that was changed.

As a result, the cache rewrite, which was originally spread over 12 hours, was compressed into just two hours, creating an intense "cache rewrite storm" that instantly overwhelmed the asynchronous task queue, causing shared infrastructure components to crash. The cascade effect spread to the service responsible for proxying HTTPS Git operations, ultimately exhausting all platform connections.

A number changed from 12 to 2.

GitHub was breached due to a configuration change I made myself.

But if you only see this one configuration change, you’ve likely missed the most important part of the story.

01 Not a single accident, but ten accidents.

The incident on February 9 was not an isolated event.

In fact, during the first three months of 2026, GitHub experienced at least eight major outages. In February alone, there were 37 recorded incidents of varying severity. GitHub’s CTO, Vlad Fedorov, later acknowledged in a blog post that during these two months, GitHub failed to maintain the “three nines” — 99.9% availability — it had promised to enterprise customers.

Reviewing the incident logs from the past two months, you'll notice a curious pattern: each outage appears to have a different cause.

February 2: An issue with the Azure compute provider caused GitHub Actions to be down for nearly four hours, affecting Copilot coding assistants, CodeQL, and Dependabot.

February 9: Cache stampede, authentication database overload.

March 5: Redis cluster outage caused 95% of GitHub Actions workflows to fail to start within 5 minutes, with an average delay of 30 minutes.

March 18: Webhook latency surged to 32 times the normal level.

Each incident appeared to be an "accident," with different immediate causes each time. But Fedorov’s explanation tied them together into a single narrative. He said these incidents shared three common structural causes: “rapid load growth, tight coupling between services leading to the spread of localized failures, and a lack of system protection against anomalous client traffic.”

In engineering terms, the foundation of GitHub is beginning to show cracks under the weight of new loads.

And this "new load" has a specific name.

02 Weekly 275 million submissions

Key data

Total commits for 2025: approximately 1 billion

Weekly commit volume in 2026: 275 million

At this rate, the full-year estimate for 2026 is 14 billion (a 14-fold year-over-year increase).

GitHub Actions compute: 5 billion minutes per week in 2023 → 10 billion in 2025 → 21 billion minutes in a single week in early 2026

If you were an infrastructure engineer at GitHub, comparing the monitoring dashboards from 2025 and 2026 would likely leave you stunned.

Throughout 2025, GitHub processed approximately 1 billion code commits. This number alone is substantial, the result of years of growth on the GitHub platform. But by 2026, weekly commits reached 275 million. Translated annually—at this pace—the total number of commits in 2026 would approach 14 billion, a full 14 times the total for all of 2025.

This is not a smooth growth curve, but a steep cliff. The computational demand on GitHub Actions illustrates this even more clearly: it consumed 5 billion minutes per week in 2023, doubled to 10 billion in 2025, and then surged directly to 21 billion minutes in a single week in early 2026.

What is疯狂 submitting code?

Not a human developer.

GitHub data shows that AI agents are now the platform’s most active "users." Just the Claude Code tool alone accounts for 4.5% of all public repository commits on GitHub—2.6 million commits per week, up from just 100,000 at the end of September 2025, a 25-fold increase in three months.

The number of PRs opened by AI agents is also exploding. In September 2025, AI-generated PRs totaled around 4 million per month; by March 2026, this number surged to 17 million—more than quadrupling in just six months.

There is an image that can help you understand what this means.

Previously, GitHub's "users" were primarily human programmers. They worked during the day, slept at night, took weekends off, thought carefully before each commit, hesitated at times, and had a limit to their typing speed. System load followed human作息 patterns, exhibiting predictable peaks and valleys.

Now, an increasing number of "users" are AI agents. They don’t sleep, rest, or hesitate; multiple parallel agents can be launched for a single task, each capable of submitting more in an hour than a human engineer can accomplish in a week. More importantly, they are not just submitting code—they are continuously creating new repositories, treating repositories as output products of workflows rather than workspaces for humans.

GitHub’s infrastructure engineers are no longer facing a higher-volume version of the same problem, but rather a fundamentally different one.

03 Copilot has run out of funds.

Frequent failures are only one side of the problem; GitHub has another, more troubling issue—discovering a loss when accounting.

The original pricing logic for Copilot was based on a reasonable assumption: users primarily engage in "assistive completion," with each interaction being brief and computationally predictable. The personal plan at $10 per month and the business plan at $19 per month, charged per seat, has worked well over the past few years.

Then, Agentic AI arrived.

Agentic workflows and traditional completion are two different species. Standard code completion involves linear, predictable requests with brief computational cycles. In contrast, an Agentic coding session may run for hours, launching multiple parallel threads to perform multi-step reasoning, self-correction, and cross-repository refactoring—a single session can consume more tokens than an average user’s entire monthly subscription fee.

GitHub is facing a situation where a small number of heavy Agentic users are consuming computing resources worth hundreds of dollars per month for just a few dollars in fees.

In response to this situation, GitHub's reaction was straightforward—first, implement traffic control, then adjust pricing.

At the beginning of this year, GitHub implemented two parallel rate-limiting mechanisms for Copilot: a maximum session duration and a weekly usage limit, both calculated based on token consumption multiplied by model computation weights. At the same time, new user registrations for certain individual Copilot plans were paused.

On June 1, GitHub completed a more fundamental pricing overhaul: Copilot fully transitioned to pay-as-you-go billing, replacing subscription plans with AI Credits, where 1 AI Credit equals $0.01, with usage calculated in real time based on token consumption.

The era of charging by seat has come to an end in the face of Agentic AI.

This shift isn't just GitHub's problem. It's a collective pricing crisis sweeping the AI tools industry in 2026—when AI begins replacing humans in entire workflows, rather than merely assisting them, all subscription models based on “per user per month” will no longer work.

430x, not 10x

Returning to the infrastructure issue: How exactly does GitHub plan to address this 14-fold growth?

Here is a detail that illustrates the severity of the issue:

In late December 2025, Agentic workflows suddenly began accelerating. GitHub engineers realized that a 10x increase was not enough. By February 2026, following the major outage, GitHub announced it needed to redesign its architecture for 30 times today’s scale.

It's not an upgrade—it's a complete redesign.

The difference between these two terms is significant. Scaling up means adding more machines or increasing memory for existing databases—keeping the same direction, only increasing the scale. Redesigning means that the current architectural assumptions would systematically fail at 30 times the scale, requiring a fundamental rethinking of service decomposition, data flow, and fault isolation from the ground up.

The specific directions disclosed by GitHub include decoupling critical services to prevent cascading failures, implementing backpressure mechanisms and traffic degradation capabilities, deploying dedicated hosts for hotspot services, eliminating single points of failure, and improving change management—avoiding direct deployment of changes such as "reducing cache TTL from 12 hours to 2 hours" without adequate load testing.

It is worth noting that GitHub is not alone.

Stripe has already encountered issues with AI agents creating accounts in bulk, and AWS is building dedicated identity, logging, and production control systems for agents. These actions are not preemptive—they are responses to clear signals already appearing on monitoring dashboards.

GitHub was just the first to be breached—it sits at the very core of the AI toolchain.

05 Code repositories are becoming the exhaust pipes of AI.

Take a moment to consider the nature of the entire situation.

What is GitHub? The most straightforward answer is that it’s where programmers store their code. But on a deeper level, it’s the infrastructure for human software collaboration—commits are the trail of collaboration, pull requests are containers for discussion, issues preserve intent, and actions are pipelines for execution. The entire system is designed around human work rhythms, thought processes, and collaboration patterns.

The AI agent changed all of this.

When an AI agent can submit code hundreds of times in a single day, and each "commit"背后 lacks human thought or deliberation—only a step in a task loop—can the code repository still be considered a "container for collaboration"?

When AI tools automatically generate repositories, open pull requests, run CI, and merge automatically—is the developer still the main actor in this process, or have they been reduced to merely being a "reviewer" or even a "bystander"?

GitHub’s CTO described the crisis using the term “rapidly growing load.” But this term likely underestimates the nature of the problem—it’s not just an increase in volume, but a qualitative shift in usage. Under the old model, GitHub was a “tool for developers”; under the new model, GitHub is becoming an “exhaust pipe for AI,” an output pipeline for automated workflows.

What this means for GitHub still has no answer. A 30x scaling can resolve traffic issues, but it cannot redefine the business model or solve the identity question of “Who are my true users?”

Recently, a rather telling phenomenon has emerged: after experiencing outages, GitHub has published numerous detailed engineering blogs, thoroughly explaining the root causes of each incident—with a level of transparency that is almost surprising. Some believe this is GitHub proactively building trust, while others see it as trading transparency for the patience of the developer community, as more instability is likely during the upcoming restructuring phase.

A platform that has been overwhelmed by its own success must dismantle and rebuild itself—and this very process is also a test of whether it can endure.

On the night of February 9, the engineer waiting for the PR to be merged likely finally got the green light. But he may not have realized that the outage he endured was not an accident on GitHub’s part, but a signal heralding the software development industry’s entry into a new era.

This article is from the WeChat public account "GeekPark" (ID: geekpark), author: AstronautApe.