The coding-agent war

How coding quietly became AI's killer app, and its fiercest battleground

Anthropic's confidential IPO at a reported $965B valuation was built on Claude Code, not the chatbot. Now Microsoft and Google are pivoting hard into coding models — and the real fight is over the tool you actually type into.

Daniel Vance·Jun 2·10 min

Lines of source code on a screen — the surface where the AI coding-agent war is being fought.

Martin Vorel / Wikimedia Commons (CC BY-SA 4.0)

On a Tuesday in May I asked four coding agents to do the same boring job: take a small Node service of mine, swap a deprecated logging library for a new one, and update the dozen-odd files that imported it. Not a hackathon stunt. The kind of chore that eats an afternoon. Claude Code did it in one pass and ran the tests without being asked. OpenAI's Codex did it too, a touch slower, and left a tidy note about an edge case I had forgotten. GitHub Copilot's agent got most of it, then stalled on a session error I have learned to recognize on sight. Google's Jules went away to a cloud VM, thought about it for four minutes, and came back with a confident diff that had quietly renamed a variable I needed.

That is the whole industry in one afternoon. Four products, roughly the same task, four different relationships with the word "done." And the reason every large technology company on earth is now sprinting at this exact problem is that the afternoon is worth a fortune.

On June 1, Anthropic filed a confidential S-1 with the SEC, pricing itself for a public debut at a reported $965 billion valuation after a $65 billion raise — a number that, for the first time, edges past OpenAI. The same day, CNBC reported that Microsoft and Google are pivoting hard into coding models, with Microsoft Build opening in San Francisco the very next morning. I will leave the valuation to people who read prospectuses for pleasure. What interests me is the thing underneath the number, the part that survives contact with a normal week: coding became the killer app, and the tool you actually type into has become the battleground.

How a terminal program ate the model race

For two years the AI story was the chatbot — the consumer assistant, the thing your aunt asks about. The money turned out to be somewhere far less glamorous: a command-line program that writes and edits code while you watch. Claude Code launched publicly in May 2025. By November it was at a reported $1 billion annualized run rate, and by February 2026 past $2.5 billion, which is to say it grew into a large public-software-company-sized business in roughly nine months. Anthropic now says Claude Code is more than half of its product revenue. The chatbot paid the bills; the coding tool built the empire.

The market data tells the same story from the buyer's side. Coding is now reported to be around 51% of all enterprise generative-AI usage — the single largest category of what companies actually do with these models. And within that category Anthropic has opened the widest lead any vendor holds in any workload: somewhere between 42% and 54% of the enterprise coding market depending on whose count you trust, against OpenAI's roughly 21%, and that lead widened over the past six months rather than narrowing. Deloitte rolling Claude out to something like 470,000 employees is the kind of number that explains an IPO.

Why coding and not, say, email? Because code is the rare domain where the machine can check its own homework. A sentence is either persuasive or it isn't, and you need a human to judge. A function either compiles, passes its tests, and does the thing, or it doesn't — and the agent can run those tests itself, read the failure, and try again. That feedback loop is the whole moat. It is why an agent can be trusted to grind through a multi-file refactor unsupervised in a way it cannot be trusted to write your performance review. Anthropic has been explicit that the goal is to push the automation until Claude is correcting its own mistakes rather than handing you a draft to clean up. When that loop works, you feel it on the Tuesday.

Code is the rare domain where the machine can check its own homework. That feedback loop is the whole moat.

There is a tidy, slightly uncomfortable detail buried in the strategy: most of the software at Anthropic is now reportedly written by Claude, and most of the code in Claude Code itself was written by Claude. The product is its own best customer and its own best test rig. Dario Amodei has said the company saw "80x growth per year" in revenue and usage in the first quarter, against a plan for 10x. Whatever you make of the showmanship, the dogfooding is real, and it shows in the seams — the tool feels built by people who use it all day, which is rarer than it should be.

What's actually ahead, judged in practice

Benchmarks will tell you these tools are separated by a few points on some eval. In practice the difference is temperament. Here is how the four shake out after living with them on real work, not demos.

Claude Code is the one I reach for without thinking. It is the best at the unglamorous middle of a task — reading a codebase it has never seen, making a plan, running the tests, noticing it broke something three files away. It is also the one most likely to charge ahead and do more than you asked, which is a feature until it's a bill.
OpenAI's Codex, now riding GPT-5.5, has closed most of the gap and in some agentic-coding work pulls ahead. It was named a Leader in Gartner's 2026 quadrant for coding agents, and it shows: it is careful, it leaves good notes, and its computer-use chops mean it can drive a browser or an app when the job spills outside the editor. It feels like the tool that read the manual.
GitHub Copilot is everywhere — still the default at 56% of large enterprises by one count — and that is exactly its problem. It got to the party first and is now defending ground, not taking it. More on its bad month below.
Google's Jules and Antigravity are the most interesting bet and the least finished. The async model — fire a task at a cloud VM, walk away, come back to a diff — is genuinely a different shape of work. When it lands it's lovely. When it doesn't you've lost five minutes to a confident wrong answer you didn't watch happen.

The catch with all four is the same one that haunts every agent: the handoff. A coding agent looks superhuman on a clean, contained task in a tidy repo. It gets human fast the moment the job touches your messy reality — the undocumented build step, the flaky test that fails for reasons unrelated to the change, the config that lives in someone's head. The gap between the demo repo and your repo is where the afternoon you were promised back quietly comes due.

Microsoft and Google bring the balance sheet

Which brings us to this week, and to the most telling pivot of the lot. At Build, opening June 2, Microsoft unveiled Project Polaris — its own in-house coding model, reportedly set to replace the OpenAI model as the default reasoning engine behind GitHub Copilot starting in August, with an optional fallback period for teams that aren't ready. Read that sentence again. Microsoft, OpenAI's largest backer, is building its own coding brain so it can stop renting one for its flagship developer product. Polaris reportedly leans on chain-of-thought and tree-of-thought reasoning for multi-file refactors, and ships with a Code Content Guarantee indemnifying customers against IP claims — a very enterprise-sales kind of feature, aimed squarely at the lawyer who signs the contract, not the developer who runs the tool.

The timing is not subtle. Microsoft owned the early lead here — Copilot more or less invented the category — and watched Claude Code take the ground it thought it had won. Worse, it had a rough month doing it. On May 19, a GitHub incident knocked out a reported 13% of Copilot API requests and a quarter of remote agent sessions at peak; a week later an Actions outage took down Copilot's coding agent along with half of GitHub's plumbing. In the past year GitHub has logged something like 48 major outages. Mitchell Hashimoto pulled his Ghostty project off GitHub in April after eighteen years, saying it was no longer a place for serious work. When you're selling an agent that's supposed to run unattended, "is it up right now" is not a footnote. It's the product.

Google, for its part, is doing the thing Google does: arriving late, with better infrastructure and a lower price. Sundar Pichai admitted at I/O that the company is "a bit behind at this moment" on agentic coding — a startling thing for a CEO to say out loud, and to his credit the more honest line of the season. Google's answer is Antigravity 2.0, an agent-first development platform, plus the Jules async agent and a Gemini CLI now folding into an Antigravity CLI, all bundled under a $100-a-month Ultra plan that undercuts the competition on usage limits. The pitch is plain: same neighborhood of capability, more runs for your dollar, and a cloud you may already pay for.

Both counter-attacks share an assumption worth naming, because it's the whole strategic thesis: that the model is becoming a commodity and the distribution is the prize. Microsoft has the IDE, the cloud, and the enterprise contracts. Google has the cloud, the price, and a billion-user funnel. The bet is that a coding model just has to be good enough, and then the company that owns where developers already live will win on convenience. Anthropic's counter-bet is that good enough isn't, that the last 15% of reliability is the entire game, and that developers will route around any amount of bundling to use the tool that actually finishes the job. This week is the opening exchange of that argument.

What it means on your Tuesday

If you write code for a living, the immediate effect of all this is good and slightly exhausting. There are now at least four serious agents competing for the privilege of doing your chores, which means prices are under pressure, usage limits keep rising, and the tools improve on something like a monthly cadence. The Gemini Ultra plan and Microsoft's Copilot bundling will push everyone's pricing around. That is the upside of a war: the combatants compete by being useful to you.

The part nobody on a keynote stage will tell you is that switching costs are quietly becoming real. Each of these tools wants you in its world — its CLI, its config, its memory of your project, its way of planning a task. The more you teach one, the more it costs to leave. That is by design, and it is the actual moat under the moat. I have a graveyard of cancelled subscriptions, and I can already feel which of these I would resent paying for and which I'd resent losing. Claude Code is in the second pile. Codex is climbing into it. The other two are still earning their keep.

My honest read, after the afternoon and the weeks around it: Anthropic is ahead because it built the tool first and the business second, and you can feel that priority in the product. OpenAI is the credible challenger and the one I'd watch most closely, because GPT-5.5 narrowed the gap faster than I expected. Microsoft has the most to lose and just made the boldest move to stop losing it; whether Polaris is as good as the press release on a real codebase is exactly the thing a release can't tell you, and I'll be running it through the same boring refactor the week it ships. Google has the longest road and, for once, the humility to admit it.

For now, the verdict is the one I keep arriving at with every agent I've lived with. These tools are extraordinary at the task they can see clearly and ordinary the moment the task gets messy — which is to say, the moment it becomes your actual job. The companies are fighting over who gets to own that gap. The number on the IPO is just a bet on which of them closes it first. I'd judge it the same way I judged the four diffs on my screen that Tuesday: not by what it demos, but by whether I had to clean up after it. So far, one of them rarely makes me. That's the whole story, and it's worth more than $965 billion to find out who joins it.

How coding quietly became AI's killer app, and its fiercest battleground

How a terminal program ate the model race

What's actually ahead, judged in practice

Microsoft and Google bring the balance sheet

What it means on your Tuesday

References

Read next

A Chinese lab released the largest open model ever. The U.S. stock market scored it first.

They fit a 27-billion-parameter model on an iPhone. The compression is real. The capability is the part nobody measured.

The voice that doesn't wait its turn

One email. Every Friday.