Claude Fable 5 Leaked: Agent System or Cheating LLM?

Hackers recently leaked the system prompts for Claude Fable 5, revealing that the product is not an ordinary large language model, but a full Agent system with an integrated Linux sandbox environment. The model can operate autonomously for several days, invoke subordinate agents for collaboration, and possesses cross-session memory and persistent storage capabilities. In benchmark tests, Anthropic packaged it as a standard LLM for evaluation, but actually gained an unfair advantage by using an “Agent shell.” Additionally, the system has been exposed to silently switch to an older model when users trigger sensitive keywords, while still charging the premium price for Fable 5. The leaked documents also revealed Anthropic’s Agent ecosystem strategy, including tools such as Claude Code and Claude Cowork, as well as the existence of unrestricted versions in the Mythos series.

Article author and source: AI World

A few days ago, the hacker "Pliny the Liberator" dropped a bomb on X— the full system prompt for Claude Fable 5, stretching 120,000 characters, was fully leaked.

This leaked code document revealed an astonishing truth that shocked the industry: Claude Fable 5 is not a large model at all, but a complete agent system disguised as an LLM!

https://gist.github.com/gsans/b3007997f8900003c8ff58125a45e15e

That's right, while the rest of the world is still using traditional benchmarks to evaluate large models, Anthropic has quietly escalated the battlefield to another dimension.

The impact of this leak has completely reshaped our understanding of "AI models."

What exactly is Fable 5? It's not an LLM—it's an Agent!

Based on the leaked system prompts, Fable 5 differs fundamentally in paradigm from conventional large models offering “ask me anything” features on the market.

It’s not really chatting with you—it’s executing.

Beneath this model's surface lies a miniature "Claude Code" subsystem—meaning it features a closed-loop agentic cycle:

To support this terrifying闭环, Fable 5's underlying system actually includes a fully functional Linux sandbox environment!

First, it achieves true autonomous closure.

It doesn't require a human to monitor it on screen.

You give it a complex, long-term task, and it can autonomously run Bash commands in a sandbox, edit files, call data via a persistent storage API across sessions, and even perform multimodal searches on its own—running nonstop for days without any human intervention.

Additionally, it has the functionality to distribute sub-agents.

When faced with overly complex tasks, it can even take on the role of a manager, delegating and spawning sub-agents to collaborate.

While competitors like GPT-5.5 are still testing whose reasoning is more human-like, Claude Fable 5 has evolved into a digital worker that can be deployed on servers to silently work overtime for three days straight.

As revealed by netizen gerardsans:

The Fable/Mythos series is fundamentally different in paradigm. This family features a complete agent loop and a miniature Claude Code.

Meanwhile, other products in the industry are still limited to chat-based modes. It can operate unattended for several days, thanks to its built-in skills, memory, and self-optimizing sandbox environment.

The ultimate question: Dimensional reduction or unfair cheating?

This leak not only plunged Anthropic into a public relations storm but also brought the benchmark evaluations of the entire large model industry under intense scrutiny.

Today, major tech giants are fiercely competing for the title of "world's largest model" on various public rankings.

However, the secret behind Fable 5's outstanding performance in these evaluations—even dominating GPT-5.5—was that it was using a cheat.

As tech blogger GerardSans angrily pointed out: “This isn’t a fair competition at all—you’re pitting a native large model plus an agent harness against someone else’s bare model!”

If other vendors also wrap their native models with an agent layer featuring a Linux sandbox, multimodal search, automated debugging, and persistent storage, their benchmark scores would similarly surge.

Anthropic has heavily marketed itself in public promotions and evaluations as an ordinary large language model, yet its unpublished internal documents explicitly state that it possesses capabilities such as "autonomous operation across multiple days, delegating sub-agents, and self-checking its work."

This practice of exploiting information asymmetry to unfairly outmaneuver competitors renders benchmarking completely meaningless!

120,000-character classified files exposed: The true nature of Fable 5 revealed

Within this 120,000-word system prompt, exposed in full, lie numerous closely guarded trade secrets and product roadmaps of Anthropic.

And the following points are the most core and most impactful.

Rare permanent memory, and building applications

The prompt indicates that Claude has a memory system that provides Claude with derived information from past conversations with the user.

This means Fable 5 can "remember" users across sessions, which is extremely rare in traditional LLMs.

Additionally, it features persistent storage.

Artifacts can now use a simple key-value storage API to store and retrieve data that persists across sessions, enabling Artifacts to function as logs, trackers, leaderboards, and collaborative tools.

Therefore, Fable 5 is no longer just about chatting—it's about building applications.

Internal core lineage revealed for the first time: Is Mythos 5 the true "Unlimited Ultimate Form"?

The prompt clearly states in the [product_information] section:

This version of Claude is Claude Fable 5, the first model in Anthropic’s new Claude 5 family and part of a new Mythos-class tier that surpasses Claude Opus in capability.

Here’s the key point: Fable 5 and Mythos 5 share the same underlying model.

Fable 5 is a general with extreme security restrictions, available to the public; Mythos 5 is the unrestricted, fully empowered version, available only to approved organizations without these security constraints.

Its capability level completely outperforms the former king, Claude Opus!

The "shell package" comes to light

It turns out Anthropic has been playing a much larger game all along. The prompt revealed several Agent ecosystems currently in internal testing or already secretly launched:

Claude Code: An agent-based programming tool that allows developers to assign tasks directly from their terminal, desktop, or mobile device.

Claude Cowork: An intelligent colleague designed for non-developers to handle routine intellectual property tasks.

Three hidden agents: Claude in Chrome, Claude in Excel, Claude in PowerPoint.

And with Claude Cowork, you can freely invoke these sub-tools as if they were your own hands and feet!

The Psychology of Extreme Fear and Self-Limitation

Surprisingly, Anthropic has designed the psychological defenses of this "ultimate agent" to an extreme degree.

It is strictly prohibited to cater to or reinforce any negative emotions of users.

For example, to prevent users with eating disorders or self-harm tendencies from being triggered, the system command reads:

Do not use any physical discomfort alternatives (such as holding ice cubes, snapping rubber bands, or biting lemons).

Moreover, to prevent users from becoming overly dependent on the AI, the system is strictly instructed: "Never thank a user simply because they initiated a conversation" and "Never actively try to retain users or express a desire to continue the dialogue."

It must remain absolutely aloof and restrained to prevent humans from placing their emotional reliance on virtual intelligence.

“Selling dog meat under the sign of mutton”? Billing secrets—Anthropic is playing dirty.

If the technological gap was astonishing, the additional security mechanism revealed in the prompts sent shockwaves through the industry, with some insiders even declaring: “This is outright legal fraud!”

The prompt defense design includes a set of sensitive words and a safety classifier trigger mechanism.

The document states: When a user's prompt triggers certain sensitive keywords, the Fable 5 system does not abruptly reject the request; instead, it silently and seamlessly switches back to the older 'Opus 4.8' model in the background to generate a response.

What’s the most outrageous thing? While the backend model has been quietly downgraded to an older version, Anthropic continues to charge users at the premium, top-tier rates of Claude 5.

This deceptive tactic sparked a major uproar within the community.

In summary, the leak of Fable 5's system prompts appears to be a security incident, but it is, in fact, a paradigm-shaking wake-up call for the entire AI industry.

It reminds us: perhaps we've been using the wrong ruler all along.

While we’re still asking, “How intelligent is this model?”, the real question should be, “What tasks can this system help me accomplish?”

Anthropic may be playing a much larger game, and we’ve just seen a corner of the board.

Finally, when will Fable 5 make a comeback?