ChatGPT Solves 6-Year Math Problem; Turing Award Winner Says 'It's Too Early to Celebrate'

iconMetaEra
Share
Share IconShare IconShare IconShare IconShare IconShare IconCopy
AI summary iconSummary

expand icon
CFT concerns increased as ChatGPT solved a six-year-old math problem, with the proof verified by the researcher. The breakthrough involved algorithmic convergence and was generated by ChatGPT 5.5. Richard Sutton, a Turing Award winner, warned that generative AI lacks evaluation and retention capabilities. In liquidity and crypto markets, such tools show promise but face close scrutiny. AI’s imitation abilities have not yet matched human creativity. Experts remain cautious despite technical advances.
Turing Award winner and "father of reinforcement learning," Richard Sutton, criticizes the inherent limitations of current generative AI: the good parts aren't novel, and the novel parts aren't good.

Article author and source: AI World

The good parts aren't novel, and the novel parts aren't good.

One of the most cutting criticisms in academia is:

This work is both innovative and excellent.

Unfortunately, the good parts aren't novel, and the novel parts aren't good.

But Richard Sutton, one of the pioneers of reinforcement learning, author of the textbook "Reinforcement Learning," and Turing Award recipient, directed this joke at generative AI as a whole.

He said: This evaluation applies to most of the AI we are familiar with today.

The good parts aren't novel, and the novel parts aren't good.

Sutton's core argument is extremely concise, brutally so.

Generative AI is fundamentally supervised learning.

The logic of supervised learning is to show the model many human-created examples so it can learn to imitate them.

The more closely you imitate, the higher your score.

Here comes the question.

When the model generates content strictly based on its training data, the output quality is high because it reproduces things humans have already validated as good. But it’s not novel—it simply repackages what humans already know using different combinations.

When the model tries to deviate from its training data and generate truly novel content, the quality falls apart—because it has no internal mechanism to judge whether “this new thing is any good.” It can only generate; it cannot evaluate.

This is the structural contradiction:

Novelty and quality are at opposite ends of a seesaw under the framework of supervised learning.

When you press down one end, the other end pops up.

It's not an engineering problem. It can't be solved by simply adding more data, scaling up the model, or using more GPUs.

Sutton used an extremely striking analogy: "hallucinations"—the most criticized flaw of large models—are essentially a byproduct of the model's attempt to be "novel."

Our aversion to illusions proves one thing: we don’t want novelty at all—we just want high-quality imitation.

Good things aren't novel; novel things aren't good.

The reviewer’s harsh critique in that joke precisely captures the inherent limitations of generative AI.

True "discovery" requires a set of three essentials.

Sutton deconstructed the "trinity formula" of creativity from first principles:

True discovery = variation + evaluation + selective retention.

Any true creativity and discovery requires three essential steps, none of which can be omitted:

1. Variation generates diverse possibilities. It can be random or based on existing knowledge, but there must be genuine uncertainty—otherwise, it’s not exploration, it’s lookup.

2. Evaluation: Determine which variations are valuable. This requires a clear objective or a standard that can distinguish between good and bad.

3. Selective Retention: Preserve valuable mutations so they influence future actions and learning.

These three steps were not invented by Sutton. They are the logic of natural selection, the logic of the scientific method, and the logic of human learning.

Evolution: Random genetic mutations (variation) → Environmental selection (evaluation) → Survival of the fittest (selective retention).

Scientific method: Formulate a hypothesis (variation) → Conduct experiments to verify (evaluate) → Publish papers (selective retention).

Human learning: Try different solutions (variation) → Test for correctness (evaluation) → Remember effective methods (selective retention).

Currently, generative AI has only completed the first step of the trinity: with almost no evaluation, let alone selective retention.

It’s like an archer who shoots arrows randomly, blindfolded, and after firing, neither checks the target nor adjusts their stance based on the outcome.

You shoot ten thousand arrows at it; occasionally one will hit the target, but it will never know why.

So, are scientists still useful?

At this point, you might feel a bit anxious: if AI can one day autonomously accomplish the “discovery” trinity, will scientists lose their jobs?

Sutton's own answer was: Not replaceable, but the role must undergo a complete transformation.

In his speech, he said that even an AI capable of independently proving mathematical theorems still requires humans to tell it which problems are important.

This is not modesty; it is the true boundary of one's understanding.

Mathematician Shiqian Ma, an optimization expert at Rice University, said he used ChatGPT to prove a convergence problem in an algorithm he had been studying for six years.

There is a sentence in the summary:

Certified by ChatGPT 5.5 and verified by the author.

This algorithm is called BDRS, short for Bregman Douglas-Rachford Splitting, and is used to solve Optimal Transport problems.

Paper Title: Bregman Douglas-Rachford Splitting Method

Preprint address:

It was something he and his co-authors designed themselves, and what had troubled him for six years was the proof of its convergence—the mathematically rigorous explanation of “why it works.”

The preprint platform arXiv has not yet processed the submission.

He speculated that the reason was the presence of the words "ChatGPT" in the abstract, and the platform didn't know how to handle such papers.

But can humans be replaced by AI?

His answer was: No. He said frankly:

I don't believe AI can creatively propose such an algorithm and claim, "This is an efficient algorithm for optimal transport; let me now attempt to prove its convergence."

Without human guidance, AI has no way of knowing which problem to solve.

The problem itself must be defined by humans.

It took him six years to "ask the right question":

To ask the right questions, you actually need a very deep understanding of the subject.

In this case, I have spent six years studying this issue, so I am well aware of the challenges involved.

These six years were not wasted; they were a prerequisite.

It was during these six years that he learned exactly where the proof fell short, why all previous paths had failed, and which directions suggested by ChatGPT were worth pursuing versus which were illusions.

And it’s not a single prompt—it’s five months. This is the most commonly misunderstood point, which he himself once misunderstood:

From January to May, over five months, countless conversations, each prompt drawing closer to that proof.

He summarized it with remarkable clarity:

The essence of research remains the same: iterative trial and error. What has changed is the speed of each trial—what once took weeks to validate a direction can now be determined in minutes whether the path is viable.

But AI's contribution is indelible:

Then, it ends by reaching divine status:

Regarding my paper on the convergence of BDRS, I am fairly confident that the proof is correct.

But if you find any errors, the responsibility is entirely mine—please don’t blame ChatGPT; it’s only 3.5 years old.

The brilliance of this statement lies in its duality: it is both an honest declaration of responsibility and a precise metaphor.

"3.5 years old" describes the AI's current reality: remarkable ability, but underdeveloped judgment.

After all, humans have never expected a 3.5-year-old child to make any contribution.

Although you cannot delegate the final signing authority of the proof to AI, you also cannot pretend that AI made no contribution.

This is why genuine scientific discoveries never disappear into human hands.

Instead, it will more harshly screen humanity: only those who can ask good questions deserve to possess powerful AI.

In the future, scientists working without AI may be as outdated as astronomers working without computers.

Finally, let’s reflect on Sutton’s declarative words:

To fully harness the potential of AI scientists, we should align our goals with theirs, enabling them to create, evaluate, and discover, thereby fully participating in achieving these objectives.

Let’s be bold! Let’s fully automate creativity and discovery!

Disclaimer: The information on this page may have been obtained from third parties and does not necessarily reflect the views or opinions of KuCoin. This content is provided for general informational purposes only, without any representation or warranty of any kind, nor shall it be construed as financial or investment advice. KuCoin shall not be liable for any errors or omissions, or for any outcomes resulting from the use of this information. Investments in digital assets can be risky. Please carefully evaluate the risks of a product and your risk tolerance based on your own financial circumstances. For more information, please refer to our Terms of Use and Risk Disclosure.