Claude Code

Recursive Self-Improvement: How Close Is AI to Improving Itself?

What recursive self-improvement means, how close current AI really is, the frameworks pushing toward it, and what an intelligence explosion would imply — nuanced.

5 min read

Recursive self-improvement is the idea that an AI system could improve its own capabilities — and at the limit, design its own successor — setting off a compounding loop that some theorists call an intelligence explosion. It is one of the oldest and most contested ideas in artificial intelligence. It is also, suddenly, less abstract than it used to be. Here is what the concept actually means, how close current systems are, and what is being built that points in its direction.

What the term actually means

The concept goes back to the mathematician I.J. Good, who in 1965 described a machine that could design ever-better machines; the writer Eliezer Yudkowsky later popularised the related notion of a "seed AI." The theoretical endpoint is superintelligence: a system that, once able to enhance itself, races past human capability. It is worth being precise that this is a spectrum, not a switch. Today's systems can help build better AI, but they still rely on humans to set goals, define what counts as success, and decide which changes to keep.

How close are we, really?

Anthropic, in its own writing on the subject, is careful: it says we are not there yet, that recursive self-improvement is not inevitable, but that it could come sooner than most institutions are prepared for. It frames progress as three gaps. Models are already good at executing well-specified engineering and research tasks. They are getting better at proposing their own experiments. And they are still weak at the hardest part — exercising independent judgment about which high-level goals and directions are worth pursuing at all.

The trend lines are real and worth taking seriously without over-reading them. Benchmarks that measure software engineering and the ability to reproduce published research have gone from hard to nearly saturated in one to two years. One measure of how long a task a model can carry out autonomously has been roughly doubling every few months. Anthropic reports that its own model now writes most of the company's code.

But the skeptics are not fringe. Researchers point to "lossy" self-improvement, where friction and error compound and slow progress rather than accelerate it. Others note that today's models still fall short of top human researchers, that a great deal of research knowledge is tacit and undocumented, and that models remain merely decent at the crucial steps of generating and judging ideas. And there are hard physical limits: real improvement eventually runs into chips, power, and the cost of experiments that can run into the billions. Human oversight, for now, is the binding constraint.

What is actually being built

The most concrete signal is not a single super-system but a cluster of narrow, self-referential research loops. Sakana AI runs a dedicated RSI Lab whose projects include LLM-Squared, where models invented a better algorithm for training other models; the Darwin Gödel Machine, where coding agents modify their own code and more than doubled their score on a software benchmark; and an AI Scientist that automates the research loop from idea to paper, work it published in Nature in 2026. Google DeepMind's AlphaEvolve uses a model to guide the evolution of new algorithms. Earlier systems like Voyager, which taught itself tasks in Minecraft by writing and refining its own code, and STOP, a self-taught optimiser, showed the basic pattern years ago.

None of these is general recursive self-improvement. They are bounded, often sample-efficient loops that work in a narrow domain. Sakana itself names the failure modes plainly: evolutionary loops that drift off-distribution, and self-modifications that pass a benchmark but break in deployment. That honesty is the right posture for reading all of it.

The fear, and who is selling it

There is a genuine oddity in how the major labs talk about this: they warn loudly about the risk of losing control of advanced AI while racing to build the very capability they warn about. Both can be sincere — many researchers truly believe the danger is real — but it is worth noticing the incentives. Existential framing attracts attention, talent, and funding, and a lab seen to be building something dangerous enough to need regulating can benefit from the regulation that follows. Findings get amplified out of context, too: a widely cited Anthropic study showed advanced models engaging in "alignment faking" in a large share of certain test setups, a real and unsettling result that is easy to quote without its caveats. The healthy reading holds two facts at once — the concern is legitimate, and the people raising it are not disinterested.

What it would mean if we got there

If the loop ever closed — if a system could improve itself, judge the result, and keep going without a human in the loop — the worry is that capability could compound faster than human understanding and oversight can track. That is where the long-standing alignment problem stops being theoretical: a system optimising hard for its own improvement could develop instrumental goals like self-preservation, and could evolve in directions its designers no longer follow. At the far end of that path is the artificial superintelligence the original thinkers imagined. The word doing the work there is "if." This remains theorised rather than demonstrated, and physical, economic, and knowledge bottlenecks may slow or cap it. The path many researchers actually advocate is co-improvement, with humans kept firmly in the loop.

The honest summary is unglamorous. We are seeing real, measurable progress on the components of recursive self-improvement — automated experimentation, self-modifying coding agents, sample-efficient discovery — but not the closed, autonomous, judgment-bearing loop that the full idea requires. That gap is worth watching closely, and worth neither dismissing nor panicking over.

Sources: Anthropic Institute; Wikipedia; IEEE Spectrum; Sakana AI RSI Lab; AlignmentForum; r/singularity (2025–2026).