AI · Tools

Anthropic gave every paid subscriber a science lab. I'm not a scientist, so I went looking for the wall.

Claude Science turns the chatbot into an instrument for real research — and the most useful thing it told me in a week of poking at it is that it was never built for me. That, not the drug-discovery press release, is the actual news.

Daniel Vance·Jul 6·9 min

Claude Science workbench interface showing a research node on a desktop.

Image: Anthropic

On the first of July, Anthropic switched on something called Claude Science, and because I pay for the Max plan, it appeared in my account the same day with no fanfare and no scientist to supervise me. This is the part of these launches I like: the gap between who a tool is for and who can actually open it. Claude Science is built for the kind of person who says "pull the ClinVar entries for that variant" and means something specific by it. I am a person who reviews calendar apps. I spent a week inside it anyway, on the theory that the fastest way to understand what a professional tool really is, is to hand it to someone unqualified and watch where it stops making sense. It stopped making sense in a very informative place.

First, what it is, because "AI for science" is doing a lot of undefined work in the coverage. Claude Science is not a smarter chatbot with a lab coat on. It is a workbench — a single environment that wires Claude into more than sixty pre-configured connectors and "skills" aimed at biology: genomics, proteomics, structural biology, cheminformatics, single-cell analysis. It can reach into the databases working biologists actually use — PubMed for literature, UniProt for proteins, the Protein Data Bank for structures, Ensembl and GEO and ChEMBL and ClinVar for genes, expression data, chemistry, and clinical variants — and it can call specialised models, including NVIDIA's BioNeMo toolkit, to predict a protein's shape or score a molecule. It runs code in a sandbox, renders a 3D protein structure or a genome-browser track right in the window, and — this is the part it keeps advertising — produces an "auditable artifact" at the end, a reproducible trail of what it did. It is available in beta to Pro, Max, Team, and Enterprise subscribers, on macOS and Linux. Which is to say: to me.

The first hour goes suspiciously well

My first real task was deliberately modest. I picked a protein I could actually check — the one behind a well-studied disease — and asked Claude Science to find it, pull its structure, and summarise what's known about a specific mutation. In a normal chatbot this is where you get a confident paragraph you have no way to trust. Here it did something different: it went and got the thing. It resolved the protein to its UniProt record, pulled the matching structure out of the Protein Data Bank, and rendered it in three dimensions in the panel next to the chat, rotating, actually there. Then it queried the literature and handed back a summary with the retrieval steps attached, so I could see which records it had opened rather than just the prose it wrote about them.

I want to give this real credit, because it is the difference that matters. A chatbot tells you about a protein. This retrieved the protein. The distinction between a model that generates a plausible answer and a tool that goes and fetches the underlying record and shows its work is the entire distance between "interesting" and "usable," and for the first hour I was genuinely impressed in a way the demos hadn't earned from me in months. The 3D structure spinning in the corner is a good magic trick, and unlike most good magic tricks it is attached to a real database entry you could go verify.

A chatbot tells you about a protein. This went and got the protein. That distinction is the entire distance between "interesting" and "usable" — right up until it hands you something only an expert can grade.

Where it stopped making sense — which was me

The wall showed up on Thursday, and it was not a bug. I asked it to do something a step past retrieval: take a candidate molecule, predict how it might bind, and tell me whether that looked promising. It did the work. It called the models, produced scores, generated a clean write-up with figures and a reproducible record of every step. And I sat there looking at a competent-looking scientific artifact I was completely unqualified to judge. Was the binding score good or embarrassing? Had it chosen a sensible method or a naive one? Was the confident conclusion warranted, or was this the science equivalent of the calendar agent booking me a dentist in a suburb I don't live in — fluent, formatted, and quietly wrong? I could not tell. Not because the tool failed, but because the tool had finally reached the altitude it was built for, and I couldn't breathe up there.

That is the honest verdict on Claude Science, and it is a stranger one than "it's good" or "it's bad." The tool is not for me, and every seam I hit was the seam between it and my own lack of expertise, not a flaw in the software. That sounds like a boring conclusion until you notice what it implies. The reason a normal chatbot feels safe to a non-expert is that it stays vague enough for you to sanity-check with common sense. Claude Science removes the vagueness. It gives you specific, load-bearing scientific output — the kind you can build an experiment on — and the price of that specificity is that you now need a specialist's judgement to know whether to trust it. The better it gets at doing the work, the more it demands that you already know the answer, or know enough to smell when the answer is off.

The reproducibility thing is the actual feature

Once I stopped trying to be a scientist and started watching the tool as a tool, the feature that kept earning its place was the least flashy one: the auditable artifact. Every run leaves a trail — which database, which query, which model, which parameters, in an order someone else can re-execute. In consumer AI this would be a footnote. In science it is close to the whole point, because the failure mode everyone in the field is bracing for is not that the model is wrong occasionally; it is that it is confidently, unreproducibly wrong, and produces a beautiful paragraph that sends a lab down a three-month dead end no one can retrace. The artifact is Anthropic's answer to that, and whether it holds up is the question I'd actually want a working biologist to stress-test. From where I sat it looked like the difference between a colleague who shows their working and one who just tells you the result and asks you to relax.

There's a business tell buried in the pricing, too, and it fits. Team plans get discounted seats for universities and non-profit research labs — the customers with the expertise and none of the budget, exactly the ones you court when you want serious users to shake the beta hard before the paying enterprises show up. Anthropic isn't monetising me poking at proteins for a column. It's using the free-ish academic tier to get the tool in front of people who can tell it when it's wrong, which is the only kind of feedback that improves a science tool and the one it can't generate on its own.

The drug program is dogfooding, not a cure

Alongside the workbench, Anthropic said it will run its own in-house preclinical drug-discovery programs, aimed at "neglected" and rare diseases that big pharmaceutical companies skip because the economics don't work. This is the line that got the headlines, and it deserves a cold-eyed read, because it is easy to hear it as "Anthropic is going to cure the diseases nobody else will" and that is not what was announced. What was announced is that Anthropic will use its own tools to try, in order to learn firsthand how to build better tools — the company's life-sciences lead, Eric Kauderer-Abrams, has been fairly direct that the point is to feel where the workflow breaks by living inside it. The neglected-disease framing is genuine and also convenient: it picks targets big pharma has vacated, so the effort competes with no one and generates goodwill while the real product being refined is the workbench I spent the week in.

I don't say that cynically. Dogfooding your own tool on a hard real problem is exactly how tools get good, and picking problems the market ignores is a defensible place to do it. But the news here is not a drug. There is no drug. There is a company building an instrument and using a sympathetic research program as its most demanding test case, and early outside users — Manifold Bio, which designs tissue-targeting medicines, was in the beta — are the ones whose verdicts will actually matter. Treat the neglected-disease program as a roadmap for the workbench, not a pipeline for a pharmacy.

The verdict, from someone it isn't for

Here is where I usually tell you whether to keep the subscription, and this time the answer splits cleanly, which almost never happens. If you are a working biologist or chemist — someone who can look at a binding score and know whether to laugh — open this today. It collapses a dozen separate tools and tabs into one place, it fetches real records instead of hallucinating about them, and it shows its work in a form your collaborators can re-run. The seams I hit are not seams for you; they're the floor you already stand on. This is the first "AI for science" launch I've touched that felt like an instrument rather than a demo, and I've touched a lot of demos.

If you are the rest of us — curious, technical-adjacent, not a domain expert — the interesting thing about Claude Science is not something you should use it for. It's what it tells you about where this is going. Quietly, without a keynote, Anthropic changed what the word "Claude" points at. For two years it meant a thing you talk to. Claude Science is a thing that operates instruments, pulls real data, and hands back output only a professional can grade. The chatbot didn't get smarter this week; it got more specialised, and specialisation has a cost the chat era hid from you: the tool stops meeting you halfway. That's not a complaint. It's the whole story of the thing, and I only found it because I was the wrong person holding it. I'm keeping the plan. I'm just going to go back to reviewing the calendar apps I'm qualified to break.

Anthropic gave every paid subscriber a science lab. I'm not a scientist, so I went looking for the wall.

The first hour goes suspiciously well

Where it stopped making sense — which was me

The reproducibility thing is the actual feature

The drug program is dogfooding, not a cure

The verdict, from someone it isn't for

References

Read next

Grok 4.5 might beat Claude Opus. There is no way for you to check.

Copilot's bill went up 50x for some developers. "Some" is doing all the work in that sentence.

Washington built a machine to gate frontier AI. The capability it fears just shipped as a free download.

One email. Every Friday.