The Lab That's Afraid of Its Own Homework

Rich Washburn
Jun 6
5 min read

RSI

7:44

Anthropic just published a blog post called When AI Builds Itself. It's the most honest thing a frontier AI lab has ever put in writing. And the fact that they published it tells you everything you need to know about where we actually are.

Let's start with the number that should stop you mid-scroll.

More than 80% of the code currently being merged into Anthropic's production codebase was written by Claude. Read that again. The company building the AI is now primarily powered by the AI it's building. The loop is not theoretical. It is not a roadmap item. It is happening right now, in the building, on the commits, in the repo. Anthropic's median employee estimated 4x more output using Claude Mythos Preview compared to working without it. And an internal benchmark they've been running for years — where they ask Claude to optimize a small training run for speed — went from a 3x improvement last year to a 52x improvement this year.

A skilled human researcher, working hard, would need four to eight hours to hit 4x. Claude hit 52x. That's not helpful. That's a different category of thing.

One of Anthropic's own researchers, writing anonymously in the same post, said this: "On days where everything works well, I can't help but think that nothing I do matters. Everything is automated and better and faster than I will ever be. But then there are days where everything breaks and I don't understand why. I realize I have no idea what I've been up to anymore."

This person is one of the smartest people on Earth, working on one of the most consequential problems in human history. And they're describing the experience of being simultaneously outperformed and confused by their own tools. That's not a quote from a sci-fi novel. That's from a blog post published this week by the company building the thing. Here's the part that gets underreported. The architecture behind all of this — the multi-agent systems, the parallel sandboxes, the author-executor-critic loops — none of this arrived without warning signals. Back in late 2023, researchers at Carnegie Mellon and Unanimous AI were publishing early results on Conversational Swarm Intelligence: AI agents acting as real-time bridges between human groups, breaking information silos, cross-pollinating ideas across chat rooms. Microsoft's AutoGen was getting attention in the same window — multiple agents collaborating in defined roles, an author writing, an executor running code, an editor reviewing, a critic checking quality. The scaffolding for everything Anthropic is now doing internally was visible in the research and in the tools for anyone paying close attention.

The question in 2023 was whether these multi-agent systems would stay experimental or become load-bearing infrastructure. Anthropic just answered that. They didn't adopt the architecture — they became it.

The same week this post dropped, every major AI lab CEO signed a letter to Congress. Sam Altman, Dario Amodei, Demis Hassabis, Mustafa Suleyman, Paul Graham, Patrick Collison — and a long list of genomics founders and biotech researchers who don't have a PR interest in this conversation. Their ask was narrow: mandate screening for orders of synthetic nucleic acids.

Why? Because AI systems now outperform PhD-level virologists on technical laboratory procedure questions in their own domain. The capability curve that produced a 52x research optimization speedup is the same curve that applies to biology, chemistry, and physics. The biotech researchers who signed this letter aren't doing it for exposure. They're doing it because they're watching the same data everyone else is watching and doing the math.

This isn't the AI safety community sounding familiar alarms. This is the CEOs of the companies producing the capabilities raising their hands on a specific, narrow, technically solvable problem and asking for help before it becomes unsolvable. Anthropic's post lays out three futures. They're unusually honest about which ones they think are likely.

The first scenario is that the curve plateaus. Progress slows, open source catches up, the frontier models stop compounding. They include this scenario for completeness. They don't believe it.

The second scenario is sustained compounding efficiency — but with humans still in the loop as the judgment layer. AI does the work. Humans set the direction, review the output, steer the experiments. The bottleneck becomes finding enough skilled people to manage the throughput. This is the scenario Anthropic thinks the current evidence points toward. It's also, counterintuitively, the more hopeful scenario — not because it's comfortable, but because it keeps humans meaningfully in the system. The job disruption is real. The acceleration is enormous. But it's navigable.

The third scenario is recursive self-improvement in the full sense. RSI. The AI does AI research better and faster than any human team can. As it improves itself, it also improves its ability to improve itself. The cost of intelligence drops toward zero. Human researchers become — as the anonymous quote describes — people who can't contribute when things go well and can't understand what's broken when things go wrong.

Anthropic didn't publish this post to generate coverage. They published it because they believe they're approaching the boundary between scenario two and scenario three, and they want a verified coordination mechanism in place before they cross it. Specifically: a global slowdown protocol where every major lab can slow down and confirm that everyone else has actually slowed down. Because if nine of ten players stop and one doesn't, the one who keeps going wins everything. That's not hypothetical governance architecture. That's the exact game theory problem they're trying to solve in real time.

The intelligence staircase is a useful frame here. We tend to think of intelligence as a continuous scale — smarter, slightly smarter, much smarter. But the history of biological intelligence suggests it works more like a step function. The jump from ant to chicken to chimp to human isn't a smooth curve. Each step unlocks qualitatively different capabilities that simply don't exist on the step below. The human on the step below the chimp couldn't predict calculus. The chimp on the step below the human couldn't predict language. Whatever sits on the step above us is, by definition, outside our ability to model from where we're standing.

Anthropic knows this. It's the reason they exist. It's the reason they published the post. And it's the reason the same people who were competing ferociously for GPU allocation and talent last week all signed the same letter to Congress this week. The machine is starting to build itself. The people building it are raising their hands. Whether anyone in a position to act is watching the same movie — that remains the open question.

Rich Washburn is a technologist and strategist working at the intersection of AI, cybersecurity, and capital. He is Managing Partner and Chief AI Officer at Eliakim Capital, and CIO of Data Power Supply.

The Lab That's Afraid of Its Own Homework

Recent Posts

Comments