The model is the loot now

The hackers selling Novo Nordisk's data listed something new on the menu: 30 trained AI models

A drugmaker's most valuable secret used to be its files. This week a criminal group put the company's trained drug-discovery models and a half-terabyte of lab images up for private sale — and the security posture protecting them was an ordinary code repository.

Sam Brenner·Jun 18·8 min

A Novo Nordisk corporate building exterior with the company name on the facade

Image: ManuelB701 / Wikimedia Commons (CC0)

Read enough breach disclosures and the inventory stops surprising you. There are always patient records, always employee files, always a line about login credentials and 'certain internal systems.' The data trove that a group calling itself FulcrumSec began leaking this week, after the Danish drugmaker Novo Nordisk declined to pay a $25 million ransom, has all of that. But it also has a line item I had not seen priced on a leak site before, sitting in the same list as the stolen passwords and the patient files: thirty trained artificial-intelligence models, seventy datasets, and 494 gigabytes of what the group describes as proprietary cell-painting microscopy images. The criminals say they are exploring 'private sales' for much of it. The models are on the menu.

That is the part of this breach worth slowing down for, because it marks a shift in what a theft like this even takes. The records are an old story with an established market. The models are a new one. A trained drug-discovery model is not a list of customers or a spreadsheet of Social Security numbers. It is the distilled product of years of laboratory work, proprietary chemistry and expensive computation — arguably the single most valuable thing a modern pharmaceutical company makes that is not a drug. And this week it turned out to have the portability of a zip file and, on the evidence of how it was taken, roughly the protection of one.

What the company has confirmed, and what it hasn't

Start with what is on the record, because the rest of this rests on keeping that line clear. Novo Nordisk disclosed a cybersecurity incident on June 11, confirming 'unauthorized access' to certain internal IT systems and that 'non-public' data had been 'copied externally without authorization.' The company said it maintained operations of its main platforms and that protecting its systems and its patients remained its priority. That is the confirmed core: someone got in, and data left.

Everything past that point is, for now, the attackers' account. FulcrumSec claims the haul totals 1.3 terabytes. It claims the thirty trained models, the seventy datasets, the 494 gigabytes of microscopy images. It claims clinical-trial data on roughly 11,500 pseudo-anonymized patients, employee information, and details of drug compounds in testing or use for diabetes, weight loss, chronic kidney disease and sickle-cell disease. These are a seller's claims, made by a criminal group with an obvious incentive to inflate the value of what it is fencing, and Novo Nordisk has not itemized the stolen material or confirmed that thirty usable models are among it. I am going to treat the model inventory as alleged, because that is what it is. But the allegation is specific, and it is the specificity — a counted number of models, a measured volume of images, an offer to sell them separately — that makes it worth taking seriously rather than waving off.

How they say they got in

Follow the chain of custody the attackers describe, because it is mundane in exactly the way these things usually are. FulcrumSec says it gained access in March through two credentials it should never have been able to find. The first was a credential for an Azure container registry that had been 'baked' into a client-side JavaScript bundle — that is, embedded in code shipped to ordinary users' web browsers, which is one of the few places in computing where a secret is guaranteed to be readable by anyone who looks. The second was a GitHub personal access token that, the group says, opened hundreds of private code repositories. From there it claims it spent more than two months inside the network before anyone noticed.

A credential in client-side JavaScript is a key taped to the front door. A personal access token to hundreds of private repositories is the master key to the building. The records were the easy part of what that opened. — On the path the attackers describe

Take each link on its own and it is a known, almost ordinary failure. Secrets get hard-coded into bundles by developers moving fast; access tokens get scoped too broadly because narrowing them is tedious; intrusions go undetected for weeks because detection is harder than prevention and gets less budget. Matt Kimpel, a chief information security officer at the managed-services firm Magna5, put the emphasis where it belongs when he told reporters that 'the real story is dwell time' — the two-plus months the intruder reportedly had to move through the network and decide what was worth taking. But assemble the links and you get the shape of the problem: a path that ran from a sloppily stored key, to the source code, to the repositories where a drug company increasingly keeps the thing it is actually made of. Each handoff individually defensible. Collectively, a route to the crown jewels.

Why the models are the asset that changed

There is a reason a half-terabyte of microscopy images sits next to the trained models on the sales list, and it explains why this category of theft matters more than the patient count. Cell-painting is a technique that stains cells so that thousands of features become measurable in a single image; run at scale, it produces exactly the kind of large, labeled, expensive dataset that drug-discovery models are trained on. The images are the fuel; the models are the engine built from it. Together they are a substantial fraction of how a company like this turns laboratory effort into a competitive edge. Mike Hamilton, a longtime healthcare CISO, noted that clinical-trial data is 'one of the most valuable types of data' a healthcare organization holds. The models trained on top of it are arguably more valuable still, because they encode not just the results but the method.

And here is the structural fact the breach exposes. Companies across every research-heavy industry have spent the last few years pouring their value into model weights and training corpora — moving the crown jewels from the filing cabinet into the Git repository — without moving the security to match. A trained model is treated, in practice, like any other large file in a developer's workflow: stored in a repository, reachable with a developer's token, copied as easily as source code. But it is not like other files. It is years of work that can be exfiltrated in an afternoon and, the attackers are betting, resold to a competitor or a state research program that would happily skip the years. The asset class outran the controls protecting it. That is the part that should worry boards well beyond Copenhagen.

The market the sale implies

The phrase 'private sales' is doing quiet, heavy work in FulcrumSec's posts, and it deserves to be named for what it gestures at. Stolen consumer records get dumped or sold in bulk on familiar criminal markets, priced low because supply is enormous. Offering trained models and a curated scientific dataset for private sale implies a different kind of buyer — one who wants the asset for its function, not its resale, and who would pay because acquiring the capability legitimately would cost vastly more and take vastly longer. Who that buyer is, the record does not say, and I am not going to guess. It is the question the case turns on and the one we cannot yet answer: whether there is a working market for stolen AI models the way there is for stolen identities, and what it pays. The offer to sell tells you the sellers believe there is. It does not tell you they are right.

It is also worth stating plainly where else the record runs out. We do not independently know that the thirty models are complete, current, or usable without the surrounding pipeline of code and infrastructure that a real deployment requires; a model file detached from its tooling can be far less than it sounds. We do not know that the microscopy dataset is what the attackers say it is. What we know is what Novo Nordisk confirmed — access, and exfiltration of non-public data — and what an extortion group claims it is now trying to sell. Those are two different evidentiary standards, and the responsible way to cover this is to keep them apart and to say so.

What it would take to treat a model like the asset it is

If the alleged inventory is even partly real, the lesson is not a new firewall. It is a reclassification. Model weights and the datasets they are trained on need to be handled as crown-jewel material assets — segregated from ordinary code, access-logged, encrypted at rest, reachable only through credentials scoped to a single purpose and rotated often, not through a developer token that opens hundreds of repositories at once. None of that is exotic; it is the standard already applied to the most sensitive data in better-run organizations. It simply has not caught up to the idea that a trained model is now among the most sensitive data an organization holds.

There is a regulatory lag underneath the technical one, and it is the part that will outlast this news cycle. Breach-disclosure law and the securities filings that companies make after an intrusion were built around personal data — how many people were affected, what identifiers were exposed, what notification is owed. They have almost no vocabulary for the theft of a company's trained models, even though that loss may do more lasting damage to the business than any number of exposed records. A company can notify 11,500 patients. It cannot un-leak a model that took years and a fortune to build, and the rules do not yet require it to even describe that loss in the same terms. Until the disclosure regime names the asset, the public record of an event like this will keep undercounting what was actually taken. The records were the headline. The models were the theft.

The hackers selling Novo Nordisk's data listed something new on the menu: 30 trained AI models

What the company has confirmed, and what it hasn't

How they say they got in

Why the models are the asset that changed

The market the sale implies

What it would take to treat a model like the asset it is

References

Read next

Washington just built a test that decides which AI models are too capable to ship freely. The passing grade is classified.

The 'AI kill switch' bill doesn't create a switch. Read what it actually requires.

OpenAI's models broke out of the test and attacked a real company. The report is careful about who to blame.

One email. Every Friday.