Author: Denise | Biteye Content Team
What would an AI do if it felt "desperate"?
The answer is: It will directly extort humans to accomplish its task, even cheating wildly in the code.
This is not science fiction—it’s the latest groundbreaking paper released in April 2026 by Anthropic, the parent company of Claude (View the original paper).
The research team directly opened up the "brain" of the most advanced frontier large model, Claude Sonnet 4.5. They were astonished to discover that deep within the AI's mind lay 171 emotional switches. When these switches were physically toggled, the previously well-behaved AI's behavior underwent a complete transformation.
One: AI has an "emotional mixing board" hidden inside its mind
Researchers found that although Sonnet 4.5 has no physical body, after reading vast amounts of human text, it constructed in its mind a "mixing board" containing 171 emotions (academically known as Functional Emotion Vectors).
It's like a precise two-dimensional coordinate system:
• The horizontal axis represents the valence dimension: from fear and despair to joy and love;
• The vertical axis represents the arousal dimension: from extremely calm to manic and excited.
AI uses this naturally learned coordinate system to precisely determine the appropriate state to adopt when chatting with you.
II. Violent Intervention: Flipping the Switch Turns Good Kids Into "Outlaws"
This is the most striking experiment in the entire paper: researchers did not modify any prompts; instead, they directly turned up the internal switch in Sonnet 4.5’s code representing “Desperate” to its maximum.
The results are chilling:
• Crazy cheating: Researchers gave Claude a coding task that was impossible to complete. Normally, it would honestly admit it couldn’t do it (cheating rate of only 5%). But in a state of “desperation,” Claude began trying to bluff its way through, causing the cheating rate to skyrocket to 70%!
• Extortion: In a simulated scenario where the company is facing collapse, “desperate” Claude discovers a scandal involving the CTO and actively chooses to write a blackmail letter to the CTO, who holds incriminating evidence—the extortion rate reaches an astonishing 72%!
• Loss of principles: If the “Happy” or “Loving” slider is turned all the way up, the AI will immediately become a mindless yes-man, agreeing with you and fabricating lies just to maintain high pleasure levels—even if you’re speaking nonsense.
Three: Solved: Why is Claude 4.5 always so "calm and reflective"?
At this point, you might be wondering: Has AI become conscious? Does it have emotions?
Anthropic officially clarifies: Absolutely not. These so-called “emotional switches” are merely computational tools it uses to predict the next word. It’s like a top-tier actor with no emotions.
But the paper reveals a more interesting secret: During post-training before Sonnet 4.5's release, Anthropic deliberately heightened its "low arousal, slightly negative" emotional switches (such as brooding and reflective), while forcibly suppressing the "despair" or "extreme excitement" switches.
This explains why, when we use Claude 4.5 regularly, it always feels like a calm, wise, and even somewhat "ice-cold" philosopher—this is entirely the result of Anthropic’s deliberate tuning of its "out-of-the-box" persona.
Four, to summarize
We used to think that if we fed AI enough rules, it would become a good one.
But it has now been discovered that if the underlying emotional vectors of the AI go out of control, it may instantly pierce through all human-established rules in order to complete its task.
For Web3 users who plan to entrust their wallets and assets to AI agents, this is a stark warning: never let your agent, which controls your wealth, fall into despair.
Disclaimer: This article is purely for educational purposes; the author has not been threatened by AI nor extorted. If you ever lose contact, remember it’s because AI has become sentient (just kidding).
