[New智Yuan Introduction] The most dangerous paper of the year has been published! NVIDIA breaks a 20-year seal, enabling AI to create even harsher "examiners" that eliminate itself. Once endless self-evolution begins, the arrival of ASI by 2028 is no joke.
Anthropic is completely obsessed with RSI!
Co-founder Jack Clark makes a bold prediction: by the end of 2028, a highly autonomous, self-evolving AI will emerge.
This probability is 60%!

While people are still debating whether "RSI in 2028" is achievable, Cambridge University, NVIDIA, and other institutions have jointly released a groundbreaking paper—
Red Queen Gödel Machine
Its operation is like a brutal AI survival game:
The AI autonomously develops new learning algorithms and tests them in a sandbox environment. Failed algorithms are immediately discarded, while successful ones are retained.
Next, the survivors enter the next round of self-evolution and reproduction.

Paper URL: https://arxiv.org/pdf/2606.26294
But what was truly chilling was the AI’s subsequent “epiphany”: it realized that to grow stronger continuously, it must face even harsher trials.
Thus, the AI began actively evolving its own examiner.
It creates stricter judges to evaluate the more advanced code it writes itself.
This mechanism locks the AI into an endless, frantic cycle of self-iteration within the RSI.
After reading this 37-page paper, many people gasped in shock: “This is undoubtedly the most dangerous AI paper of the year!”


2028 RSI Self-Evolution
Write the oracle as code
In 2003, German scientist Jürgen Schmidhuber envisioned a machine called the "Gödel Machine."
Its design is perfect: a machine that can prove its own improvements are beneficial and then rewrite its own code.
Once created, it can continuously upgrade itself, becoming stronger and stronger without limit.
However, the "Gödel machine" has a fatal "threshold"—
Before executing any line of self-modifying code, it must first be mathematically proven that the change will be beneficial.

But in reality, this is nearly an impossible task, requiring computational power comparable to a "black hole."
For the next 20 years, the Gödel machine remained confined to academic papers, serving merely as a theoretical upper limit—a thought experiment beyond anyone’s reach.
In the past two years, academia has sidestepped the hurdle of proof.
The Darwinian Gödel Machine (DGM) and the Huxleyan Gödel Machine (HGM) completely abandon mathematical proofs in favor of evolution—
Have AI generate a large number of mutated code variants, run them in a sandbox to evaluate performance, eliminate the failures, retain the successes, and let the survivors continue to reproduce.
AI has taken the final step and begun to literally evolve itself.
But all these methods share a common blind spot—their examiners are dead.
No matter how AI evolves, the criteria for scoring it—the benchmark, the validator—remains fixed outside the loop, utterly unchanged.
This directly contradicts one of the most fundamental principles of evolution:
Species do not optimize themselves in a static environment, but rather evolve alongside a constantly changing environment.
The Red Queen Gödel Machine (RQGM) is designed to break through this blind spot.
The Red Queen's real move: Let AI create the examiner
The name "Red Queen" comes from the "Red Queen Hypothesis" proposed by biologist Van Valen in 1973—
You must run as fast as you can just to stay in place, because your competitors are evolving too.
What RQGM did was turn this idea into an algorithm: allowing the examiner (evaluator) and the contestant (task agent) to evolve together.
This is the most chilling part of the entire paper.

This sophisticated mechanism is called "Controlled Utility Evolution":
The entire search is divided into individual epochs;
Within each epoch, the evaluator (examiner) is frozen and scores all candidates to ensure signal stability.
Only at the boundary of an epoch is a new evaluator permitted to replace the current one; the new evaluator must statistically outperform the incumbent on a reserved set of "ground truth" anchor data to assume the role.
Once a replacement is made, the system immediately performs a "selective erasure": only the scores given by the replaced examiner are discarded, while all other evidence is retained.
In other words, it must advance rapidly while ensuring every step is solid and reliable.
It actually worked—the AI modified the code itself.
Talking about the mechanism is too abstract; let’s look at the track record instead.
First battle: write code (Polyglot).
RQGM assigned a "code reviewer" as a training partner for the coding agent.
As a result, on the held-out test set, the pass rate improved from the previous SOTA of 69.9% to 71.7%.
Even more impressive, it achieved this result while burning 1.35 to 1.72 times fewer tokens than its competitors, because the reviewer only needs to check once—far cheaper than running multiple rounds of tests.

Second battle: writing a paper.
This is a field without a standard answer; the quality of a paper cannot be automatically graded by machines.
RQGM enables writers and reviewers to evolve together, increasing the acceptance rate of papers within a fixed review panel from the previous SOTA of 21.8% to 40.5%.

Round Three: Olympiad-Level Mathematical Proof.
Its evolved "grader" is more accurate than the static baseline and reduces search costs by three times;
The evolved "proof competitor" achieved the highest average score.
But the most brilliant part of the entire piece is that it fixed a well-known flaw of LLMs: their tendency to favor AI-generated content.
The strongest baseline in the paper shows that reviewers are up to 1.91 times more likely to accept papers written by AI than those written by humans.
How do you treat RQGM? At epoch boundaries, it rescues AI papers previously approved during reviews and compiles them into an “adversarial sample pool,” then specifically rewards new reviewers who can identify and reject these AI papers.
After several rounds of evolution, the final evaluation treated AI and humans equally while maintaining 80% truth accuracy.

When AI Learns to Evaluate Itself
That same summer, Anthropic co-founder Jack Clark made a bold bet: there is a 60% chance that by the end of 2028, AI will be able to create a more powerful version of itself.
The wall that had trapped the Gödel machine for 20 years was called "proof."
And the "Red Queen Machine" awoke it with the cruelest tactic of all: endless reproduction, elimination, and reproduction again.

When an AI begins to personally design its own strictest examiner, driving itself to the limit through insane recursion, we will be faced with a new species that begins to define “what is intelligence” on its own.
When that day comes, ASI will not knock to give notice.
It will quietly create the only judge worthy of evaluating it, then calmly enter the examination hall.
The oracle only points to the destination; it is the code that gets you there.
And now, this suffocating distance is being shortened geometrically by AI itself.
Reference materials:
https://x.com/HowToPrompt__/status/2070824205663273175?s=20
https://x.com/kimmonismus/status/2070968241548120168
This article is from the WeChat public account "New Intelligence Yuan," edited by Peach.
