AI · Data

Copilot's bill went up 50x for some developers. "Some" is doing all the work in that sentence.

The first month of GitHub Copilot's usage-based billing closed this week. The screenshots are real and the panic is loud. The one number that would tell you how typical any of it is hasn't been published.

Phuong Nguyen·Jul 1·8 min

Lines of source code displayed on a screen.

Image: Martin Vorel, via Wikimedia Commons (CC BY-SA 4.0)

On the last day of June, GitHub Copilot's first full month of usage-based billing closed, and the internet did what it does best: it reposted the worst case. A bill that went from about $29 to roughly $750. A screenshot of $50 becoming $3,000. A tidy, repeatable headline — costs up "10x to 50x" for developers running agents on frontier models. The numbers are alarming, the screenshots appear genuine, and the anger is completely understandable. None of which tells you the thing you'd actually want to know before forming an opinion: how typical is any of it?

That question — typical compared to what, measured how, across how many people — is the one the viral version skips, because a distribution does not screenshot as well as a catastrophe. So let me do the boring thing and read the footnote on the record. Not to defend Microsoft, which can afford its own lawyers, but because a price change this large deserves better evidence than a feed sorted by outrage.

What actually changed on June 1

The mechanism first, because half the panic is just unfamiliarity. On June 1, Copilot moved from flat monthly plans to consumption billing denominated in "AI credits." Each paid plan now ships with a monthly allotment of credits roughly equal to its price — Copilot Pro at $10 a month carries about $10 in credits, Pro+ at $39 carries $39, Business at $19 per user, Enterprise at $39 per user — and once you exhaust the allotment, you either cap usage or pay for more. Credits are spent against token consumption: input, output, and cached tokens, charged at each model's published API rate. Heavier and more capable models burn credits faster, which is the entire point of the redesign.

Two details get amputated from the scary version and both matter. First, ordinary code completions and next-edit suggestions — the autocomplete that is, for most developers, the actual product — remain included and consume no credits. The metered part is chat and, above all, agents: the autonomous, multi-step sessions that call a frontier model over and over. Second, GitHub gave Business and Enterprise customers enhanced included usage through the summer ($30 and $70 a month respectively, rather than $19 and $39) and shipped spending controls at the org, cost-center, and user level, so an administrator can set a hard ceiling. A bill cannot run to $3,000 unless someone left the ceiling off. That is not a defense of the pricing. It is a fact about the failure mode, and it points at who actually hits it.

Compared to what?

Now the part I care about. "Costs rose 10x to 50x" is the sentence doing the heavy lifting, and it is built on sand in three separate places.

The baseline is slippery. A jump "from $29" doesn't map cleanly onto any published Copilot plan — Pro is $10, Pro+ is $39 — so the starting number in the most-shared example is already an artifact of one person's particular setup, not a list price you can check. When the denominator is unverified, the multiple computed from it is decoration.
The sample is self-selected and tiny. The people posting bills are, definitionally, the people with shocking bills. Nobody screenshots an invoice that came in at $11. We are looking at the right tail of a distribution and reading it as the mean, which is the oldest mistake in applied statistics and the easiest one to go viral with.
The workload is doing the spending, not the plan. By the admission of the complainants' own critics — and some of the complainants — the eye-watering totals come from agentic "vibe coding": long autonomous runs, bloated context, frontier models invoked in loops. As one developer put it bluntly, the only way it gets that crazy is if you are burning tokens on a ton of iterations. That is a usage pattern, not a population.

Put those together and "10x to 50x" stops being a finding and becomes a description of the worst observed case among the loudest observed users running the most expensive possible workflow. That can be simultaneously real and useless as a guide to what the typical developer will pay. A range that wide — 10x to 50x is itself a fivefold spread — is a confession that no one quoting it has a central estimate. If you knew the median, you would cite the median.

"Costs rose 10x to 50x" is not a measurement. It's the worst case, from the loudest users, running the most expensive workflow — reported as if it were the average.

The number nobody has published

Here is the test I would apply to any claim like this: what single number would settle it, and does anyone have it? In this case the number is the distribution of actual monthly bills now that the first cycle has closed — the median, the 90th percentile, the share of paid seats that exceeded their included credits at all. GitHub has that data exactly, to the cent, for every account. It has not published it. Microsoft did not respond to reporters' requests for comment. So the most authoritative source on whether this change is a modest true-up or a broad price shock is sitting on the answer, and the vacuum is being filled by the tail of the sample.

Until that distribution exists in public, the honest statements are narrow ones. It is true that some heavy agentic users now pay dramatically more. It is true that the people most exposed are precisely the power users the AI-coding boom has been celebrating. It is not established — not by anything I can find — that the median Copilot subscriber's bill rose at all, and there is a decent structural reason to suspect it didn't: if completions stayed free and your included credits cover your chat usage, your bill is your old bill. The plural of screenshot is not data.

What the shock actually measures

There's a second reading of all this that the outrage frame misses, and one of the angrier posts stumbled right into it: "how much money was Copilot losing?" That is the real signal in the noise. A flat $10 or $39 plan that let a user drive a frontier model through thousands of autonomous calls was selling a dollar for less than a dollar. The shock of the metered bill is, in large part, a measurement of how heavily the flat plan was subsidizing heavy use — a subsidy paid for by the light users who rarely touched an agent. Usage pricing didn't invent that cost. It revealed it, and reassigned it to the people generating it.

That is a legitimate thing to be angry about, but it is a different complaint than "Copilot got 50x more expensive," and conflating the two is how a pricing reallocation gets reported as a price gouge. The critics who blame GitHub for "making it easier and easier to burn tokens" have the better point: the product was engineered to encourage exactly the agentic patterns that are now expensive, and a meter that only becomes visible after the bill arrives is a genuine design failure. A monthly cap and a forecast would have turned a billing shock into a budgeting decision. The economics were always going to bite; the surprise was the avoidable part.

How to read your own bill

Since the published distribution doesn't exist yet, the only error bars you can trust are your own. If you want to know whether the new model costs you more, the check is unglamorous and takes ten minutes:

Find your actual credit consumption for June in the usage dashboard, not your projected one — a meter that updates only on reload has been a recurring complaint, so refresh it.
Separate completions (free) from chat and agent calls (metered). If you live in autocomplete, the metered line may be near zero.
Note which models you invoked. Frontier models at full reasoning effort are the expensive line; a cheaper model on routine work changes the total far more than the plan choice does.
Set a spending cap now, before next month, so the question becomes a budget you chose rather than an invoice you discover.
Compare against your real prior cost — your old flat fee — not against a number from someone else's screenshot.

Do that and you'll have something the feed can't give you: a sample size of one that is at least the right one. The broader verdict has to wait for GitHub to publish what it already knows, and the fact that it hasn't is its own small data point. Companies release the distribution when the distribution flatters them. The silence is not proof the change was brutal — inferring that would be the same error in reverse — but it is the reason the brutal anecdotes get to stand in for the whole. If a chart would calm everyone down, and the only people who can draw it won't, look harder at why.

Copilot's bill went up 50x for some developers. "Some" is doing all the work in that sentence.

What actually changed on June 1

Compared to what?

The number nobody has published

What the shock actually measures

How to read your own bill

References

Read next

Washington built a machine to gate frontier AI. The capability it fears just shipped as a free download.

I asked my coding agent to fix a bug. It ran a stranger's command instead.

Anthropic told the Senate Alibaba ran 28.8 million "attacks." That number counts traffic, not theft.

One email. Every Friday.