Anthropic has released the new model Fable 5, available for free until the 22nd. User tests have found that Fable’s safety guardrail detection triggers far more frequently than the company’s claimed 5%, with even routine coding tasks or simple greetings sometimes triggering an automatic switch back to the older model, Opus 4.8. More seriously, the system includes an anti-distillation mechanism that silently degrades response quality if it suspects a user intends to use Claude’s outputs to train another AI. Researchers are concerned this could negatively impact academic research and technical collaboration.

Author and source: Quantum Bit

Don't get too excited yet!

Claude has just released its new model, Fable 5, which many people may not need at all!

Many netizens who have tested it found that the trigger rate for Fable 5’s safety guard detection mechanism appears to be much stricter than the official claim of less than 5%.

Whether it's a regular coding task.

Even a simple greeting might be automatically routed back to the old model, Opus 4.8.

Even more absurdly, I fell for it too. I asked Claude to help me look up some information to enrich the background.

It thought two steps ahead—snap, switched to Opus.

In other words, you think you're interacting with Anthropic's newly released most powerful model, but midway through the conversation, the other side has quietly switched personnel.

And it's not just that security checks are prone to false positives—what's even more serious is yet to come:

Anthropic has also embedded a anti-distillation mechanism in its 319-page system card.

If the system suspects you're using Claude's outputs to train your own AI model, it won't even inform you what happened—it will simply reduce the quality of Fable's responses.

It can be said that one hand prevents you from acting maliciously, while the other prevents you from copying others—perfectly in line with A’s usual style.

Why does the fable always turn into an octopus?

Let’s catch up those who haven’t checked the news today.

Early this morning, Anthropic finally released the two highly anticipated models—

Mythos and Fable.

Among these, the biggest highlight of Fable 5 is that Anthropic is, for the first time, making Mythos-level capabilities available to general users.

The difference between Fable and the official version of Mythos is an additional safety guardrail.

Currently, Fable is freely available to everyone until the 22nd (only accessible via API on the 22nd), while Mythos remains available only to select partners of Claude.

According to the official description, Fable's software engineering, knowledge work, and visual understanding capabilities have been significantly enhanced, surpassing all previously released Claude models.

In simple terms, these two are currently at the pinnacle of large models, with all their capabilities having reached their maximum potential.

As soon as the new model was released, Kappasi, who had just joined Company A, praised it immediately.

Boris, the creator of Claude Code, also speaks highly of it.

However, impressive as it may be, once people actually started using it, they realized that the fable kept turning into Octopus (Opus).

The reason is simple.

Anthropic has installed a classifier in Fable that automatically transfers the conversation to Opus 4.8 whenever it detects discussions about cybersecurity, biology, chemistry, or attempts to distill and train a model using Claude.

This rule is clearly written in black and white on page 12 of the system card.

In actual use, the switch occurs during Fable’s thought process—it detects something feels off, doesn’t ask you, and simply switches over.

If you'd like to continue, either adjust the prompt until it's satisfied, or open a new window.

The official technical blog stated that this detection system triggers on average less than 5% of the time. However, users quickly noticed that this 5% doesn't look like 5%.

Someone said they were just analyzing code, and they got cut off anyway.

Those conducting security audits are openly saying they feel targeted and can no longer do their work.

Some also say it simply doesn’t work, and even reviewing the codebase would be rejected by Fable.

Perhaps the most absurd thing is that a netizen sent Fable their own system card for analysis, and it still cut it for them.

Another scientist working in biomedicine said that Fable simply cannot be used due to the censorship of prohibited terms.

And this is not an isolated case; many biology users have reported that Fable is virtually unusable.

Boris acknowledged the issue in the comments and said it is being addressed.

The real nuance here is that Fable will at least notify you in each of these three high-risk scenarios:

Hey, I've switched your model for you.

But if it suspects you're researching how to train the next generation of large models, it switches to another mode.

The system card specifies that targeted scenarios include limiting Claude's effectiveness in requests related to cutting-edge LLM development, such as building pre-training pipelines, distributed training infrastructure, or ML accelerator design.

In this scenario, Claude does not switch models, does not display prompts, and does not notify the user; instead, it quietly makes itself a bit less intelligent.

Anthropic's original text is highly academic: Prompt Modification, Steering Vector, PEFT. (System Card, Page 12)

In plain terms, you think you're chatting with the full-power version of Fable, but secretly, the other side has switched to battery-saving mode.

Company A has truly welded its moat directly into the reasoning chain.

As for how the system makes the determination, it is clearly explained on pages 58–59 of the system manual.

Fable runs on a two-stage detection system:

The first-layer probe directly examines the model's internal activations and screens all requests; the second layer delegates risk assessment to an independent classifier.

Once triggered, the client will automatically switch to Opus 4.8.

Anthropic even acknowledged in the report that, since the classifier almost always triggers during cybersecurity tests, Fable 5's actual performance on cybersecurity tasks is essentially equivalent to Opus 4.8.

In short, Fable 5 is still a model with conditional release:

Enjoys Mythos 5-level capabilities in most scenarios, but automatically downgrades to Opus 4.8-level capabilities in high-risk areas.

Why would Claude do this?

Today, the new model went live, and limits were reset simultaneously. After getting started, users increasingly felt something was off, and complaints have been growing, primarily centered on two issues.

The first thing is the frequency with which the safety guardrails are triggered. Anthropic says that fewer than 5% of sessions trigger a fallback, but many users' experiences clearly suggest it's more than 5%.

The second thing is the usage strategy for Fable.

Anthropic did not fully open access this time but instead adopted a limited release approach.

Meanwhile, Fable’s token consumption cost is significantly higher than Opus’s, nearly double.

This has left many subscribers wondering:

If the best models are both subject to usage limits and not guaranteed to be consistently available, will the future move toward pay-per-use pricing?

Of course, some also attribute the cause to business factors.

Some netizens believe that Anthropic is currently in a critical phase before its IPO and needs to demonstrate to investors that it still possesses cutting-edge model capabilities.

So the strongest model can be released for demonstration, but not without restrictions.

Researchers are also concerned about another issue.

If the model deliberately reduces answer quality as soon as it detects content related to cutting-edge LLM research, it would clearly be bad news for academic research and technical exchange.

More critically, users have no idea that any of this is happening. It doesn’t pop up, doesn’t notify you, and doesn’t explain why the answer suddenly got worse.

You'll only feel: Today's Claude seems suddenly not as smart.

AI researcher Nathan Lambert's assessment is also straightforward:

It may be unavoidable for model providers to implement safeguards on capabilities.

But at the very least, users should be informed when the advanced features were removed.

Anthropic’s Fable 5 Triggers High Safety Guardrails; Users Report Frequent Model Downgrades

Why does the fable always turn into an octopus?

Why would Claude do this?