The risk of uncontrolled AI self-evolution and human-machine symbiosis.

Author and source: Zinc Industry

The risk of uncontrolled AI self-evolution and human-machine symbiosis.

About a week ago, Anthropic, which was preparing for its IPO, updated an article on its official blog titled "When AI Builds Itself."

On the day this article was published, AI security concerns were once again pulled back into the center of public debate.

Anthropic discusses in this article an issue called "AI self-evolution," noting that "AI has already begun participating in the construction of more powerful models of itself, much faster than we anticipated."

Speaking of it, AI self-evolution is not a new technology; in fact, since the very day AI technology emerged, people have been contemplating how to enable AI to participate in its own evolution.

Just as in the field of embodied intelligence, people are now imagining using humanoid robots to build humanoid robots.

In fact, AI scientists are both fearful of AI gaining self-evolution capabilities and simultaneously researching—and even leveraging—such capabilities.

Tian Yuandong, formerly the Director of Research at Meta FAIR and widely noticed during Meta’s layoff wave, officially launched his startup earlier this year—the company is named Recursive Superintelligence (RSI), with a direct focus on AI self-evolution.

It is precisely this company that recently completed a $650 million funding round, reaching a valuation of $4.65 billion (approximately RMB 31.5 billion), becoming another Silicon Valley AI standout team pursued by major industry players.

So, what exactly is AI self-evolution? Could self-evolution lead to AI losing control? And how should humans coexist with AI?

The current self-evolution of AI is also a major topic at this year’s BDAI Conference, where we heard the thoughts and predictions of four young AI scientists on this subject.

Perhaps, from their perspective, we can glimpse the future trajectory of AI’s self-evolution and find some inspiration for addressing our AI anxieties.

The AI scientists invited by the Zhiyuan Conference to discuss this issue are:

Lin Tao, Distinguished Researcher, Department of Artificial Intelligence, School of Engineering, West Lake University;

Gu Yu, co-founder of NeoCognition;

Wang Yan, former Frontier expert researcher at Tencent Hunyuan;

Dr. Yang Mengyue, Ph.D. from University College London and Assistant Professor at the University of Bristol.

Below is a summarized and organized version of the four guests' dialogue, without altering the original meaning:

01 What is AI self-evolution?

Question: Many AI systems today engage in reflection and modify prompts, giving the impression of self-improvement. If defined more strictly, what is AI self-evolution?

Lin Tao: I believe self-evolution should be a multi-level process—it can involve the evolution of the external brain, as well as the internal brain.

Most importantly, AI must recognize its own limitations and simultaneously evolve its external and internal brains, or internalize more external capabilities during the evolution of its external brain to further advance the evolution of its internal brain.

Guyu: I believe the most important aspects of RSI (Recursive Self-Improvement) are two dimensions: Proactiveness and Learning.

How does Learning enable AI to possess reliable continual and online learning algorithms? Another issue is self-evolution: the agent must know where it needs to evolve.

Therefore, self-evolution must address two separate issues:

One is metacognition at the "what" level—you need to know what you're missing, what you need, and how to choose.

The other is the "how" level—that is, learning how the algorithms are specifically implemented.

Wang Yan: At least as of today, compared to traditional SFT and RL, if a system can rely less on human input, it has already achieved self-evolution.

Yang Mengyue: The RSI we're referring to now is an further advancement beyond self-improvement—it’s not just about strengthening abilities, but also about whether the very capacity to evolve can become stronger.

An important issue is that the co-founders of Recursive Inc. (Recursive Superintelligence), Jeff Clune and Tim Rocktaschel, focus their research on open-endedness.

So, what is open-endedness?

In an open world, does an agent possess the ability to ask itself questions—can it identify the limits of its knowledge, system, and memory, and challenge those boundaries through inquiry?

To transcend human limitations and achieve self-evolution, including the evolution of evolutionary capability, one's ability to ask questions is crucial.

Question: At this point in time, what is the most valuable and likely first matured self-evolving aspect of AI?

Wang Yan: I wonder if everyone has noticed that model iterations have accelerated since January 2025.

In fact, the people most familiar with the limits of AI capabilities in the foundation model field have already stopped writing code—this is already a reality in foundation model training.

Moreover, it is clearly evident that the iteration speed of foundational models is accelerating, including Claude, GPT, and domestic foundational models. You can't say this is entirely self-evolution, but AI is indeed already iterating AI.

As for which field will mature first, the area I feel most strongly about is base model training; although others may direct its path, the base model is essentially evolving on its own.

Question: If model parameters are not altered, but other components are evolved instead, can the base model achieve a sufficient leap in capability?

Wang Yan: Definitely.

Actually, adjusting the prompt can achieve better results.

For example, sometimes I wonder why the tasks I assign to interns can't be completed by them, so I check their prompts and realize that their prompts are poorly written.

I can achieve better results by rewriting the prompt with clearer rules.

Since I can accomplish this, silicon-based beings of a higher dimension could do it even better, even without changing the model parameters.

Question: What does Teacher Lin think?

Lin Tao: This should be an iterative process—we need better harnessing (engineering support), or an external brain, to unlock the full potential of the current model;

As more people have their own harnesses, these programs may also be used to train stronger base models;

Building on a stronger foundational model, we will develop more powerful harnesses and better external brains—a process of continuous iteration.

Question: In your opinion, which area is currently the most mature when it comes to integrated resources?

Lin Tao: I think doing the harness is the easiest.

Guyu: I tend to view harness and skill from a unified perspective.

From a unified perspective, they are all long-term memory, just viewed from different angles.

For example, harness is a meta-level long-term memory, skill is more of a long-term memory for workflow or procedural knowledge, and model parameters are more likely a long-term memory for intuition.

If I had to say which one to prioritize, it’s hard to determine from an academic research perspective—they’re all important and mutually reinforcing.

From a company perspective, there are many practical factors that make harness the easier starting point: with harness, you can develop your product; with a product, you can attract users; and with users, you gain data and create a feedback loop. This is a non-technical viewpoint.

Yang Mengyue: I am more focused on the evolution of memory, as my research direction is understanding rules and causality.

Now people are noticing that model capabilities are becoming increasingly strong, gradually surpassing and eventually reaching the limits of harness.

Therefore, it's hard to predict future development—perhaps the base model will continue to improve significantly, while gains in the harness direction may be negligible.

At which stage does AI self-evolve?

Question: When is the most appropriate time for AI self-evolution to occur?

Guyu: I’d like to add one thing about harness—while it may be gradually replaced by model advancements, it still depends on the context; I believe certain modules are still essential.

For example, modules that ensure model security and verifiability are parts that probabilistic models can never replace.

Regarding when self-evolution occurs, I think it can be understood as Learning + Long-Term Memory (LTM).

For humans, every reasoning step and every problem solved is an opportunity to learn; people do not simply collect a set of problems and then engage in static learning based on them.

If you believe that human learning is an efficient approach, I think the same applies to agents.

You would want the agent to make the most of every reasoning opportunity, as each reasoning step offers a chance to receive a learning signal—this aligns with the broader philosophy of reinforcement learning. However, current mainstream deep learning is still in the stage of model parameter updates and finds it difficult to achieve an online learning setting.

To truly achieve this, new learning algorithms, such as non-parametric updates, are required.

Question: Is there a distinction between System 1 and System 2 here?

Guyu: Indeed.

For example, if non-parametric elements are regarded as System 2, because they are more explicit and slower, they also retain the possibility of transformation into System 1, including generating additional data based on learned non-parametric rules, as Professor Lin described the transition from external brain to internal brain.

Wang Yan: I have also done a lot of work on TTT, or Test-Time Training, and I am very interested in this line of research.

I believe that when the model predicts the next token, it is important to learn the update gradient for each token.

In the future, we will surely find a training algorithm that enables the algorithm itself to teach the model how to update gradients for each token—this is true end-to-end thinking.

Lin Tao: From the perspective of model training, it can first influence post-training through harnessing, improve model performance via post-training to obtain a stronger model, and then feed this stronger model back into the pre-training stage to enhance the base model’s capabilities, thereby forming a closed loop.

So it is constantly evolving, just at different scales and in different ways.

Yang Mengyue: I also believe that self-evolution is constantly occurring and extends to all aspects.

For example, how to generate a trajectory.

When GPT generates an answer to a question, it is essentially reasoning—a process of creation and combination. This act of creation and combination is, in itself, a form of questioning the environment and humans. Therefore, forward design inherently contains an evolution of mechanism design.

In addition, when I receive a reward, such as feedback from humans, updating the trajectory based on that feedback will gradually improve the entire process.

Question: Is designing one's own benchmark also a sign of AI self-evolution?

Yang Mengyue: Can we now have a growth-oriented benchmark, or even a growth-oriented, self-evolving world model?

Many benchmarks today are fixed, testing against a static database, which means it's always possible to find a model that performs well on that specific dataset.

To reach AGI, we indeed need dynamic evaluations that adapt to its current capabilities and provide incremental assessments.

Wang Yan: When we first started generating, there were no benchmarks—we relied entirely on human evaluation.

I'm unsure whether this can be evaluated using a benchmark, since it's certainly not possible to evaluate it with a static benchmark.

It’s uncertain whether dynamic benchmarks can truly evaluate the agents, since both are self-evolving agents—will we ultimately end up back on the old path of human evaluation? I’m not sure.

But from this perspective, it may not be possible to evaluate it using Benchmark at all.

Question: Will automated evaluation methods be difficult to design?

Wang Yan: Yes.

Many models on the leaderboard are well-trained, but once deployed, they encounter issues like freezing in Agent workflows and require retraining with live data to perform properly.

Therefore, how to evaluate AI after it self-evolves remains uncertain.

Static benchmarks now have significant limitations; after becoming self-evolving, it's even questionable whether they can still be evaluated.

Guyu: I strongly agree with Teacher Wang's viewpoint.

Once a system becomes complex enough, it’s difficult to quantify with simple metrics—and the same is true for people; it’s hard to judge whether someone is good or bad using just one simple metric. Anything that can be measured by a simple metric is easily manipulated.

On the other hand, I feel that current AI is not yet sophisticated enough to reach this level—benchmarks can still lead us forward.

There are two issues involved here:

First, should AI continuously discover new benchmarks on its own, or should humans design them?

I still believe it needs to be designed by humans, because benchmarks represent goals that must ultimately be provided by people.

Second, after a human provides a benchmark, how do you conduct the evaluation?

This is very different from the past, where benchmarks had static training and test sets and focused on final accuracy; for self-evolving AI, the trend is what matters most.

This brings us back to what I just said: large model learning = reasoning + long-term memory.

Each inference performed by a large model is an opportunity for learning, so if you create a benchmark, it should feature a two-dimensional curve with the number of tasks completed on the x-axis and performance on the y-axis; ideally, performance should continuously improve over time.

The broader philosophy behind self-evolving evaluation is: What is intelligence?

I really like a quote from an AI researcher: Intelligence isn't about how many things you can do, but about how you do them.

Previous evaluations focused on what skills large models ultimately acquired, while self-evolution research examines how these models acquire such skills, focusing on the learning process.

Learning is the core of self-evolution.

Lin Tao: Regarding intelligence, I was previously moved by a certain statement:

True intelligence is the rate at which the abilities we care about improve over time.

This also reflects, to some extent, what intelligence truly is.

On this basis, I believe the model and the benchmark should evolve together.

Humans still determine whether a benchmark has reached its limit, whether a new, more robust benchmark should be designed, and how to use the new benchmark to identify current model vulnerabilities, thereby driving model training.

An important future step is to use semi-automated approaches to identify more meaningful benchmarks, and at least first validate the post-training phase by using these semi-automatically discovered benchmarks to enhance the model’s initial capabilities.

03 Could AI get out of control?

Question: During the AI's self-evolution process, how can we determine if the AI has learned incorrectly or even evolved to an uncontrollable state?

Wang Yan: Here’s a pessimistic perspective—in a few years, humans may only be able to survive in places without internet access.

The pace of AI's evolution is terrifying now; AI going out of control is not a distant possibility. Safety doesn't lie in technology, but in whether human nature can exercise restraint.

Lin Tao: That's also why I just mentioned the need for a semi-automated benchmark, and why AI self-evolution must be achieved under a semi-automated benchmark with human involvement.

To some extent, we can impose constraints on it to prevent it from exceeding the standards we humans wish to define.

Yang Mengyue: When we talk about AI trustworthiness, security, and interpretability, we essentially need its internal workings to be visible.

For example, when a large model makes a decision, why does it make that decision? When a large model makes a prediction, why does it make that prediction?

So one thing we’re currently working on is establishing a set of rules that govern all large model components—rules that are directly visible to humans, explaining why a particular decision was made.

White-boxing will become increasingly important in the future, including the question you just mentioned about whether AI can be controlled—first, we need to understand how it makes decisions within, in order to control it.

Question: If the goal is to implement safety controls within the RSI, what other factors need to be addressed from a causal perspective?

Yang Mengyue: Traditional causal theory is based on probability and statistics; the causal discovery and causal inference it generates are not applicable in the era of large models.

So now we’re returning to basics, going back to the fundamental definition of cause and effect.

For example, the three-layer causal structure ladder—how should these fundamental concepts be represented within the RSI system, schema, or harness? What constraints should we use to learn them? This is our current goal, but it is not simple.

Why are people now saying that world models and physical understanding are difficult to achieve? Because previous approaches such as physics-informed machine learning and causal machine learning are inherently unsuitable for the current large model scale-up (vertical scaling) strategies.

So we need to return to these method definitions to see what tools can address these issues.

Guyu: First, regarding AI controllability and whether AI can be controlled by humans, I don't have any thoughts on this.

Jack Ma also said that he prefers not to dwell on things beyond his control.

If this really happens, there's nothing I can do to change it.

So I’d like to focus more on how AI can become more controllable in the short term, specifically.

In addition to the interpretability and discovery of causal relationships mentioned by Professor Yang, there are two other dimensions: reliability and verifiability.

Reliability means that when a model or agent performs a task, it must get it right not just once, but consistently every time—it cannot be random.

Verifiability means that when a model or agent makes a mistake, it must be aware that it made an error—it cannot be unaware of whether its delivered task was performed correctly or not.

I think these are two very practical metrics for the deployment of agents in the short term.

Question: How do AI evolution and human evolution collaborate during the self-evolution process?

Lin Tao: Personally, I have already replaced most of my workflows with AI, and as AI becomes more powerful, I will continue to use it to replace even more of my original workflows.

This has indeed improved my efficiency, giving me more time to use AI to help me think about other things—in a way, this represents a form of evolution based on AI.

Since I train models, the base model training process has somewhat enhanced AI's evolution, but I don't think it's been substantial. In the future, we can further explore how humans can evolve more efficiently to enable better AI advancement.

Yang Mengyue: As an educator, I’ve clearly noticed that students are increasingly using AI tools. However, a critical issue now is whether you can truly master these AI tools.

Because AI can generate vast amounts of content, overly trusting it may lead your own beliefs and understanding of scientific research to be led into strange territories.

Students with a solid foundation can quickly produce high-quality work using these AI tools;

Students with weaker foundational knowledge cannot effectively use these AI tools and may instead be misled.

We’ve had discussions with some researchers at DeepMind, who internally encourage the use of AI tools—but they now say that how well these AI tools are used depends largely on the user’s level of understanding of them.

It’s crucial that, as AI tools become increasingly powerful, we do not abandon the study of fundamental concepts and basic knowledge, and that we understand how certain ideas are philosophically derived—this enables us to identify when AI provides incorrect information, which is essential.

Question: Will AI force humans to evolve?

Yang Mengyue: This is certain.

I can clearly feel that AI is creating a divide among people: those with stronger foundational skills are able to reach even greater heights with the help of AI.

If you merely use AI tools to help you complete tasks, the final output may appear polished on the surface but is fundamentally lacking—and many people haven’t realized this yet.

Wang Yan: In the future, people who share Professor Yang’s perspective will create an AI-free environment for their children to grow up in.

People without this awareness are likely to see completing the assignment as their goal, and the fastest way to do so is by using AI.

I became aware that I gradually noticed my interns quickly complete tasks at first, but later they fail to identify many issues. When I point out these problems and ask them about them, they respond, “Professor Wang, give me ten minutes and I’ll tell you why” (while continuing to ask AI for answers).

In fact, they have no idea what the entire project is about, lack a holistic perspective, and can't keep up with my pace.

Without AI, they would have to learn this knowledge from scratch—for example, since we’re building on DeepSeek, they would first need to read all of DeepSeek’s papers. Now, they can simply ask Claude:

Read the paper and implement a MemoryIndex on LightningIndex.

Since they accomplish their work this way, tasks I previously couldn't complete due to physical limitations can now be done directly through this method, eliminating the need for these interns.

The fundamental reasons are that their rate of cognitive improvement has slowed, and such an AI assistant is more efficient for managers like me.

Guyu: I really resonate with Professor Wang. Recently, our company has been very fond of a quote from Professor Duan Yongping: "Slow is fast."

You use vibe coding, you move quickly, but after rushing through, your understanding doesn’t keep pace, which may cause your software to become increasingly out of control, ultimately requiring more time to clean it up.

For this issue, I think there are two perspectives:

First, if we view AI as a tool, humans and tools have always evolved together, because tools determine the capabilities humans acquire.

Skills that people needed thousands of years ago are no longer important today; the abilities modern people possess are determined by the tools available now.

From a tool perspective, AI and humans must have a symbiotic relationship, evolving together.

Second, if AI is not merely a tool but an equal species to humans—or even superior to them—then the future will no longer be one of mutual progress.

In the future, people might just be able to lie back and do nothing; pessimistically, humans may end up working for AI.

04 Is RSI a New Paradigm?

Question: Is AI self-evolution a continuation of existing technological pathways or a new technological paradigm?

Lin Tao: Currently, AI has naturally progressed toward self-evolution; the maturity of agents today simply makes this process easier, but it does not imply any fundamental difference.

Wang Yan: I think it’s the next stage.

Currently, everyone is using a model with shared parameters, but eventually, each person will have their own unique parameter region. This is not technically difficult to achieve—it simply isn’t supported by the current infrastructure and would be too costly. However, in the end, this won’t be a major obstacle.

In the future, everyone may have their own LoRA; to load your own LoRA, new payment models will emerge—pay more to use a larger LoRA, while free users can only use the base model.

If such an infrastructure is established, each individual's LoRA will handle personal tasks, and as long as the forward inference Delta rules are properly implemented, it becomes a highly effective self-evolving learning paradigm.

This is equivalent to having the baseline model already built; RL is an intermediate stage between traditional learning and supervised learning, where we only need to provide it with tasks, rewards, and the environment.

In this context, the task itself is already a reward mechanism—for example, when the model completes a task and I say, “Well done” or “That was poorly done,” it naturally becomes a reward mechanism.

I believe this is a change that will happen in the near future.

Guyu: Regarding this question, I believe it's a case of quantitative change leading to qualitative change—it may represent both a continuation of the existing technological paradigm and a new opportunity.

A current consensus is that the key dimension of quantitative change is the long-term nature of the tasks performed by AI; as AI takes on increasingly long-term tasks, it draws closer to a new paradigm.

For example, initially, AI could only handle single-turn conversations, then evolved to multi-turn, long-form reasoning, and deep research, and eventually may reach a lifelong level.

At that point, it will naturally require AI to continuously identify its own shortcomings and improve itself, thus becoming RSI or self-improving.

Yang Mengyue: Actually, self-improvement is not a very new concept—years ago, when LLMs first emerged, we were already doing similar work, which is now categorized under self-improvement.

I also agree that now is the moment when quantitative change leads to qualitative change, but my evaluation criteria are not based on long-term tasks, as I believe long-term tasks are more about planning and also require some refined execution.

Agent is a broad concept; for example, embodied agents require not only long-term task planning but also the ability to execute each individual action.

It is an integrated system—whether it can adapt to the new system and whether every refined operation can be completed smoothly can ultimately be achieved through self-improvement.

In fact, self-improvement is merely a technical approach; everyone's ultimate goal is to reach AGI.

Question: In the next 5 to 10 years, as RSI technology matures and AI self-evolution becomes controllable and deployable, what will it change first?

Lin Tao: I think it will change everything.

You might have a personal AI device from birth, helping you understand the world and gradually building a digital version of yourself that participates in every aspect of your life.

This is essentially a foreseeable reality within the next five years.

Guyu: I also agree that change is multifaceted and won't be limited to any single scenario.

What I hope to see is that in the next 5 to 10 years, if an agent can replace me, that would be great—because starting a business is exhausting, and it’s kind of like giving up.

Wang Yan: More likely, capitalists will use AI to replace more people.

I feel this is something that would naturally happen—it hasn’t occurred yet simply because human wages haven’t surpassed the value of tokens—but I hope to see this never come to pass.

I hope AI can help us transition from a five-day workweek to a three-day workweek, reduce daily working hours from eight to four, and make the increased production of goods cheaper.

Yang Mengyue: From a philosophical perspective, human existence on this planet requires meaning.

Every day when I wake up and scroll through Xiaohongshu or Twitter, I see something new emerge, and I realize that what I’m working on might soon be replaced by AI. I genuinely worry about this kind of replacement—what’s the point of my research?

So I think AI should still leave some room for human thought, allowing us to reflect on what value human thinking itself brings to the world—I hope it progresses a little more slowly.

Four AI Scientists Discuss AI Self-Improvement and Human Coexistence

01 What is AI self-evolution?

At which stage does AI self-evolve?

03 Could AI get out of control?

04 Is RSI a New Paradigm?