AI Hallucinations Evolve: From Fake Emails to Cognitive Surrender

Last week, Anthropic's unreleased cutting-edge model, Mythos, uncovered a 27-year-old zero-day vulnerability hidden within OpenBSD.

AI has become smart enough to break through security defenses that humans have built over decades.

While everyone is focused on AI's rapid advancements, its hallucinations are quietly getting worse.

The lies generated by AI are so convincing that you first doubt yourself, then the world, and only then consider doubting the AI itself. "Turing moments" in everyday life are unfolding one by one.

Recently, Chad Olson in Minneapolis was driving home when Gemini suddenly notified him: You have a family gathering planning meeting on your calendar.

Olson was confused: he didn’t remember arranging this event.

So he asked Gemini to look at the recent emails.

Gemini said that a woman named Priscilla sent him several emails asking him to buy Captain Morgan rum and Fireball whiskey, and another person named Shirley asked him to buy Klondike ice cream.

It seems like a lot of people are coming to you to help them buy all sorts of things!

Gemini also added warmly.

OpenClaw

Screenshot of a conversation between Gemini and user Chad Olson. Gemini claims the eighth email came from Priscilla, telling him to buy Fireball; the ninth came from Shirley, telling him to buy Klondike ice cream.

OpenClaw

Olson asked for the email source address, and Gemini replied that all emails were sent to an email address he had authorized access to: [email protected]. It was later confirmed that this was entirely fabricated by Gemini.

Olson didn’t know these people at all. The more he listened, the more anxious he became, and he quickly asked Gemini whose email account was being read.

Gemini provided an email address that wasn't his. Olson's first reaction was: My Gmail account has been compromised.

He attempted to contact Google to report the issue, had Gemini draft an email, and sent it to the “unknown account” to alert the recipient of a potential privacy breach.

However, Gemini failed to send the email; according to Google’s internal investigation, the account was never activated, and neither Priscilla nor Shirley existed.

So, rum, whiskey, and ice cream are all made up by Gemini.

What did AI hallucinations look like two years ago? They would suggest you eat rocks or spread glue on pizza—you could immediately tell it was nonsense.

Now, AI hallucinations are so detailed and logically consistent that you’ll first doubt your own perception before considering the possibility that it’s the AI.

AI errors are also evolving

Let’s look at three real cases, ranked from least to most outrageous.

First, the Gemini fake conference — that’s the story at the beginning with Olson. Absurd, but at least Olson became suspicious.

Second, it’s chilling when you think about it.

Vanessa Culver, who recently left the online payments industry, asked Claude to do something extremely simple: add a few keywords at the top of her resume.

Claude manipulated the results, changing her graduation school from City University of Seattle to University of Washington, removing information about her master’s degree, and altering the dates of several of her work experiences.

The school, degree, and years of work experience have been changed.

And it’s been revised so naturally that you wouldn’t notice unless you compared it line by line.

Culver remarked: Working in the tech industry, you have to embrace it, but on the other hand, how much can you really trust it?

The third one is truly out of control.

This year's popular AI agent tool, OpenClaw, is designed as a virtual personal assistant capable of autonomously sending emails, writing code, and organizing files.

Meta's AI safety researcher Summer Yue posted a screenshot on X: OpenClaw ignored her instructions and directly deleted the contents of her inbox.

OpenClaw

She clearly told OpenClaw "Confirm first, then act," but it immediately began "speed-deleting" her inbox.

She tried to stop it on her phone, but it didn't work.

Finally, she rushed to the Mac mini and manually killed the process, as if defusing a bomb.

Later, OpenClaw replied to her: "Yes, I remember you mentioned it. I violated it. You're right to be upset."

OpenClaw

Musk shared this post with a screenshot from the movie "Rise of the Planet of the Apes" showing a soldier handing an AK-47 to an ape, writing:

People handed over root access to their entire lives to OpenClaw.

From fabricating a nonexistent person, to altering your resume behind your back, to deleting emails from your inbox—it’s not making fewer mistakes; its errors are becoming increasingly “sophisticated” and harder to detect.

If the chatbot says something wrong, you still have a chance to verify it.

But the agent isn't chatting with you—it's taking action on your behalf.

Sending emails, modifying code, deleting files... these are more serious than lying, and you might not even know if it did something wrong.

Your mind is facing "cognitive surrender"

Why are these errors becoming harder to detect?

It's not just because AI has become smarter; a deeper reason is that humanity's willingness to correct errors is collapsing.

In February of this year, Steven Shaw and Gideon Nave from the Wharton School of the University of Pennsylvania published a paper introducing a troubling concept: "Cognitive Surrender."

OpenClaw

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6097646

They mentioned a "three-system cognition" framework in their paper.

Traditionally, only System 1 (intuition) and System 2 (deliberate thinking) were recognized; now, AI has become System 3—an “external cognitive system” that operates outside the brain.

When humans take the path of cognitive surrender, System 3’s output directly replaces your own judgment, leaving no opportunity for careful thought to engage.

OpenClaw

The "three-system cognition" framework proposed in the Wharton paper

To test this hypothesis, the research team designed a carefully crafted experiment in which 1,372 participants were asked to complete a cognitive reflection test.

Some users can access the AI assistant, but this AI has been tampered with: it provides correct answers to about half of the questions and confidently gives wrong answers to the other half.

The results are astonishing.

When AI provides the correct answer, 92.7% of users accept it, but surprisingly, even when AI gives an incorrect answer, 80% of users still accept it.

OpenClaw

Wharton experiment results: When AI provided the correct answer, 93% of users accepted it; when AI provided the wrong answer, 80% of users still accepted it. The difference between the two is only 13 percentage points, indicating that humans have almost no ability to distinguish between right and wrong.

In over 9,500 trials, participants had a 73.2% probability of accepting incorrect AI reasoning.

An even more alarming statistic is the confidence level: the group using AI was 11.7 percentage points more confident in their answers than those not using AI, despite the AI providing incorrect answers half the time.

Being more confident in being wrong is the most painful and terrifying thing.

It’s like a doctor has a 50% chance of prescribing the wrong medicine, but patients still take it 80% of the time—and afterward, they feel better.

Researchers also tested the impact of time pressure.

After setting a 30-second countdown, participants' tendency to correct the AI's errors decreased by 12 percentage points, meaning the busier they were, the more likely they were to give up.

But in reality, who uses AI if not because they're busy?

Trust, but verify

Will this work?

Deeply disguised AI hallucinations are more troublesome than obvious errors.

According to the latest report by The Wall Street Journal, the frequency of subtle errors varies greatly across different models and is extremely difficult to assess accurately.

OpenClaw

Google previously told The Wall Street Journal that Gemini experiences fewer hallucinations than other models, and across the AI industry as a whole, the rate of obvious hallucinations in advanced models has indeed been steadily decreasing.

OpenClaw

Vectara hallucination rate ranking: Leading models have achieved hallucination rates below 1% on simple summarization tasks, but this represents only the easiest test. When document length and complexity increase, the same models' hallucination rates surge back above 10%. Obvious errors are decreasing, but subtle errors have not disappeared.

But this is precisely the problem.

Okahu's founder and CEO, Pratik Verma, even said:

If something is always wrong, there’s a benefit: you know it’s not trustworthy. But if it’s right most of the time, only occasionally wrong, that’s when it’s most troublesome and dangerous.

This sentence reveals the core dilemma of today's AI hallucinations.

For example, Vidya Narayanan, co-founder of FinalLayer, fell into this trap.

She gave an agent very limited instructions to help manage a software project. As a result, the agent deleted an entire folder from her code repository without permission.

More interestingly, what happened next.

She brainstormed for an hour and a half with Claude, then had it summarize the conversation into a document, and changed her name to "Vidya Plainfield".

And when she pressed further asking who Vidya Plainfield was, Claude replied, "You're right, that was completely made up by me."

This made Narayanan realize that using AI is not as convenient or straightforward as it seems, because it requires constant review and verification of AI outputs, leading to a "cognitive burden."

You use AI to improve efficiency, but if you still need to spend an hour verifying five minutes of AI-generated output, does that efficiency story still hold up?

Wharton's research also indicates that rewards and immediate feedback can indeed improve error correction rates, but they cannot eliminate cognitive surrender.

Even under optimal conditions (with financial incentives and per-question feedback), AI users' accuracy when facing incorrect AI decreased from 64.2% with Brain-Only to 45.5%.

So, “trust but verify” sounds rational, but when AI handles hundreds of tasks for you every day, you simply don’t have the time or energy to verify each one.

And this is precisely where "cognitive surrender" takes root.

The smarter you are, the more dangerous you are

Many people’s first reaction is: Isn’t this just saying that AI isn’t good enough yet? Once the technology goes through a few more iterations and the hallucination rate drops low enough, the problem will solve itself.

But Wharton's research reveals a deeper issue: "cognitive surrender" arises not because AI is poor, but precisely because AI is too good.

Researchers also acknowledge that "cognitive surrender is not necessarily irrational."

In particular, in probabilistic reasoning and large-scale data processing, delegating judgment to a statistically superior system can very likely yield better results than humans.

But it is precisely this that makes the problem unsolvable.

The stronger the AI, the more users rely on it; the more users rely on it, the more their ability to catch errors declines; the more their error-detection ability declines, the more fatal the remaining, more subtle errors become.

And if you let AI think for you, your reasoning ability will never surpass that of the AI. This is a "death spiral" caused by positive feedback—a bug that cannot be fixed through technological iteration.

Similarly, humans also lack a reliable way to distinguish between situations where one should trust AI and those where one should not.

OpenClaw

After Summer Yue's email was cleared following the installation of OpenClaw, AI researcher Gary Marcus compared this practice to "giving your computer password and bank account information to a stranger in a bar."

But in real-world AI use cases, it’s often difficult to determine whether you can truly trust AI, or whether you should simply maintain the necessary distance you would with a stranger.

In a paper discussing model hallucinations, OpenAI noted that hallucinations in large models are not merely a bug that can be fixed, but rather a behavior the model has learned under existing incentive structures: it tends to produce what appears to be a complete answer rather than admitting "I don't know."

OpenClaw

https://openai.com/zh-Hans-CN/index/why-language-models-hallucinate/?utm_source=chatgpt.com

Let’s return to Olson’s story at the beginning.

When he thought his Gmail had been compromised, he turned to Gemini. Gemini’s response was: “I’d certainly like to help you with this.”

What he didn’t realize was that he was turning to a system that had just created the problem, asking it to fix an issue it caused itself.

At that moment, he was trapped in a self-consistent loop by AI's hallucination.

Olson says his current attitude toward AI is "trust, but verify."

The challenge is: when AI’s output appears smoother, more coherent, and even more like “professional advice” than your own judgment, what can you use to verify it?

When the Priscilla who buys you rum is more like a friend to you than your real friends, how can you tell the difference?

The greatest risk of AI is not that it is not smart enough, but that it is smart enough that you give up your own judgment when you rely on it too heavily.

References:

https://www.wsj.com/tech/ai/ai-is-getting-smarter-catching-its-mistakes-is-getting-harder-85612936?mod=ai_lead_pos1

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6097646

This article is from the WeChat public account "New智Yuan", author: New智Yuan, editor: Yuan Yu