Chips

China trained a trillion-parameter model on domestic chips. It took fifty thousand of them.

Meituan's LongCat-2.0 is the first frontier-scale model trained end to end without Western silicon. The achievement is real. What it hides is where the chokepoint actually moved.

Jide Okafor·Jul 5·10 min

$A 300-millimetre silicon wafer held at an angle, its surface refracting light into bands of colour.$

Image: Ehsanshahoseini / Wikimedia Commons (CC BY-SA 4.0)

There is a photograph the Chinese computing industry would like you to look at, and a question it would prefer you not ask. The photograph is of a data hall: row upon row of accelerator boards, tens of thousands of them, wired into what the engineers call superpods and fed by power and cooling infrastructure that would not look out of place beside a small generating station. The question is what, exactly, is printed on the silicon inside those boards. Because on the last day of June, a company still better known in most of the world for delivering meals told everyone it had used a hall like that one to do a thing only Nvidia's biggest customers and Google's own campuses had managed before: train a frontier-scale language model, from the first token to the last, without a single Western chip in the loop.

The company is Meituan, and the model is LongCat-2.0. It is a mixture-of-experts system with 1.6 trillion parameters in total and roughly 48 billion of them active on any given token, a one-million-token context window, and — according to Meituan — a training run of more than 35 trillion tokens carried out end to end on a cluster of about 50,000 domestic AI ASICs. It was open-sourced on the 30th of June, and for a while before that it had been quietly sitting near the top of the OpenRouter usage charts for agentic coding under a name most people did not recognise, undercutting GPT-5.5 and Claude Sonnet on price. On its own terms, it is an impressive machine. The interesting part is not the machine. It is the sentence Meituan chose to build the announcement around — that this was the first trillion-parameter model to complete full-process training and inference on a domestic computing cluster — and everything that sentence is asking you to take on trust.

What the achievement actually is

Start with what is not in dispute, because it is genuinely hard. Training a model at this scale is not one big computation but a choreography of tens of thousands of chips that must stay in step for weeks without a fatal disagreement. The parameters are split across the cluster; gradients are exchanged over the interconnect thousands of times; a single chip that returns a slightly wrong number, or drops off the network, or overheats and throttles, can silently poison a run that has already cost millions to reach. Doing this on Nvidia's hardware is difficult and well-trodden — there are years of tooling, libraries, and hard-won operational knowledge behind it. Doing it on a different vendor's ASICs, with a different memory layout and a different interconnect fabric and none of that inherited scaffolding, means rebuilding the parallelism strategy, the fault tolerance, and the numerical stability from a lower floor. Meituan says it did that. There is no strong reason yet to think it didn't.

It also matters which half of the work was done on domestic silicon. When DeepSeek shipped its V4-pro family earlier in the year, it leaned on home-grown chips for inference — the comparatively forgiving job of running a finished model to answer a query — while the heavy lifting of pre-training reportedly still happened on hardware from the other side of the export-control line. LongCat's claim is the harder one: that pre-training, the part that actually stresses the interconnect and the memory system and the operators' nerves, ran on the domestic cluster too. If that holds, it is the first public demonstration that the most demanding workload in the industry can be executed without the specific chips Washington has spent three years trying to keep out of Chinese data centres. That is not nothing. It is, in fact, the whole point of the announcement, and it is why it travelled.

The number is the tell

Now the question the photograph does not answer. The figure at the centre of the story is 50,000 — fifty thousand accelerator cards. It is offered as a boast, and read quickly it sounds like one. Read the way a fab engineer reads a yield report, it is something closer to a confession. Card count is not a measure of capability; it is a measure of how much hardware you had to throw at the problem to get there. A frontier run is bounded by total effective compute and by how efficiently each chip contributes it. If you are short on per-chip performance — slower matrix units, less memory bandwidth, a weaker interconnect — you make it up the only way physics allows, which is by adding more chips and burning more power. The headline number everyone repeated is the amount of brute force the achievement required, and brute force has a bill attached.

We do not know that bill, because Meituan has not published the two figures that would let anyone calculate it: the specific chip, and the power the cluster drew. Those absences are not incidental. A domestic ASIC that needs fifty thousand units and a substation's worth of electricity to match what a Western cluster does with a fraction of both is a real capability and a poor economic proposition at the same time — and in a country where the constraint on AI build-out is increasingly the grid rather than the fab, the electricity line is the one that decides whether this scales. "We can" and "we can afford to at scale" are different claims. The announcement makes the first and leaves you to assume the second.

Card count is not a measure of capability. It is a measure of how much hardware you had to throw at the problem to get there. — On the fifty-thousand-chip figure

The chips no one will name

Here is where you have to follow the dependency one link further than the press release does, because the press release stops at exactly the point where it gets interesting. Meituan described the hardware as "large-scale clusters of tens of thousands of AI ASIC superpods." It did not name the ASIC. In an announcement engineered to make a point about sovereignty, the single most load-bearing fact — whose chip is this — is the one left blank. That silence is worth sitting with, because a chip is not a monolith. It is an assembly, and the assembly has its own upstream, and the upstream is where the export-control regime was always really aimed.

An AI accelerator at this scale is only as capable as the high-bandwidth memory stacked beside it. HBM is the part that feeds the compute; without enough of it, at enough bandwidth, the fastest logic in the world sits idle waiting for data. It is manufactured by a very short list of firms, it is itself subject to export restrictions, and it has been — by China's own hoarding behaviour over the past year — the component the domestic industry is most anxious about. A domestic ASIC still needs that memory from somewhere. It needs advanced packaging to bond the memory to the logic. And it needs a fab able to print the logic die at a competitive node, which in practice means a domestic foundry working at the edge of what its own restricted toolset allows. Route around the GPU and you have not escaped the chokepoint. You have arrived at the next one.

This is the pattern anyone who covers the supply chain learns to expect. A chokepoint does not vanish when you build a path around it. It relocates to the next narrowest place — the memory, the packaging line, the one tool, the one material — and waits there, quieter and harder to photograph. "Trained on domestic chips" is a claim about the accelerator. It is silent on every layer beneath it, and those layers are where the export controls bite. Until Meituan says what the chip is and what is stacked on it, the sovereignty story is a story about the top of the stack told as if it were the whole stack.

So it is worth being precise about what the demonstration proves and what it doesn't:

It proves that frontier-scale pre-training can run end to end on non-Western accelerators — the software and operational barrier is passable, not just theoretical.
It does not prove the chips beneath the accelerator — the memory, the packaging, the foundry node — are free of the restricted supply chain the controls target.
It does not disclose the cost of the route around: fifty thousand cards and an unstated power draw are the price of matching, not beating, what a Western cluster does with less.
It does not, on its own, tell you whether this is a one-off heroic run subsidised to make a point, or a repeatable industrial process a company would choose on the economics.

Why give it away

There is a second tell, and it sits in the licensing rather than the hardware. LongCat-2.0 was open-sourced. A company that had merely built a competitive commercial model would have strong reasons to meter it. Giving the weights away, and pricing the hosted version below the American incumbents, is a strategy aimed at a different quantity than revenue: adoption. Open weights are an artefact, not a service — once they are out, they cannot be recalled, and every developer who wires LongCat into a workflow is a developer not wiring in a model that might one day be gated by an export rule. If the domestic-silicon claim is the flag, the open licence is the wedge. The point is not to sell you a model. The point is to make the model that was trained without Western chips the cheapest, most available option on the shelf, and to let the sovereignty argument ride out into the world attached to something free.

That is also why the unverifiable parts of the announcement should be held at arm's length rather than dismissed. The benchmark leadership on OpenRouter is a real signal of real use; the price undercut is checkable. The claim that fifty thousand domestic ASICs, and only domestic ASICs, did the training is the part that arrives without a serial number — and it is the part doing the geopolitical work. In this industry the checkable facts and the load-bearing facts are frequently not the same facts, and the gap between them is where the messaging lives.

Where this leaves the wall

The export-control regime was built on a premise that was, for a while, roughly true: that the compute needed to train a frontier model runs through a chokepoint the United States could hold — advanced GPUs, the packaging that assembles them, the memory that feeds them, the handful of firms and tools that make all three. LongCat-2.0 is the clearest evidence yet that the premise has a crack in it at the top layer. A frontier-scale model was trained without the GPU. The crack is real, and pretending otherwise would be the mirror-image mistake of taking the announcement at face value.

But a crack is not a collapse, and the shape of the crack matters more than its existence. If the domestic cluster still runs on restricted memory and restricted-tool-made logic, the wall has not been breached so much as its most-watched gate has been bypassed while the load quietly shifted onto the parts of the fence no camera is pointed at. That is not reassurance. It is a relocation of the problem, and relocation is how these dependencies have always behaved. The compute chokepoint did not disappear on the 30th of June. It moved down a layer, into the memory and the packaging and the fab, to the places that were always the harder ones to build and the easier ones to leave out of a press release. The most important number in the LongCat story is still the one Meituan hasn't printed — and until it does, the model that never touched a Western chip is a claim resting on hardware no one will name.