Anthropic releases Claude Fable 5, redefining human-machine collaboration.

Author and source: AI New Era

[Guide] After the release of Fable 5, the Claude Code team said they no longer verify whether Claude is doing the job correctly, but instead focus on verifying whether it is doing the right job. Developers have shifted from being supervisors monitoring code output to product managers setting standards, and the criteria for evaluating excellent engineers have changed accordingly.

Overnight, the entire network was flooded with a image of a "5" made up of fluttering butterflies.

The star of this image is none other than Claude 3.5 Sonnet, Anthropic’s newest and most powerful model.

https://www.anthropic.com/news/claude-fable-5-mythos-5

Anthropic has announced that this is their first Mythos-level model with safety measures designed for general-purpose use, outperforming any previously released model.

Immediately afterward, the official announced that it had reset all users' hourly and weekly rate limits to zero, allowing everyone to "enjoy Fable 5 to the fullest."

On the developer side, everything changed overnight.

In the words of the Claude Code team: Previously, they focused on whether Claude completed tasks correctly; now, they focus on whether Claude is doing the right things.

Thariq Shihipar, a member of the Claude Code team, believes that Fable represents a major breakthrough in the model domain and will transform how people collaborate with Claude. With this powerful tool at their disposal, "it's time to be more ambitious."

Thariq is the author of the AskUserQuestion tool, which enables the AI to interview you in reverse: before writing any code, it presents a series of multiple-choice questions to clarify implementation details, edge cases, and trade-offs. The longer the model runs, the more critical this ability to ask clarifying questions upfront becomes.

Thariq also shared the changes brought by Fable 5, as summarized by the team—

Three things have been rewritten: how you assign tasks to it, how you verify its work, and how many of them you can manage simultaneously.

First, consider Stripe’s case in Anthropic’s announcement: a 50-million-line Ruby codebase, which would take humans over two months to migrate, was completed by Fable 5 in one day.

A 50-million-line Ruby codebase would take a team over two months to migrate manually—Fable 5 accomplished it in one day.

Compressing more than two months into a single day is no longer just about speeding things up— the division of labor between humans and AI has been redefine once again.

From a process-focused supervisor to a standards-driven product manager

The focus of this upgrade is not on benchmark scores.

Anthropic positions Claude Code as an "agentic coding environment."

It can read files, run commands, and modify code, pushing problems forward on its own while you watch, interrupt, or even step away.

This is the key: if it can do the work on its own, why are you still watching? The official best practices for Claude Code mention this:

If you don't give Claude a runnable checklist, you'll become the verification loop yourself—every error will require your personal detection.

Claude Code official best practices: Give Claude a runnable check, test, build, or screenshot comparison—otherwise, you become the verification loop yourself.

It means that in the past, you were the supervisor, sitting in front of the screen watching it write step by step, correcting one line at a time. Now it’s different. Your role has shifted from “giving step-by-step instructions” to “defining goals, providing sufficient context, and setting clear acceptance criteria.”

"Setting goals and providing context" may sound simple, but implementing them is not easy—the official best practices also provide guidance.

Don’t have it start writing code right away—first let it explore, then plan, and only then take action, so it doesn’t end up solving the wrong problem.

Another key point: Use the previously mentioned AskUserQuestion to have Claude first interview you, asking one by one about the implementation details, edge cases, and trade-offs you haven’t clarified, ultimately resulting in a SPEC.md file.

Don’t worry that these preparations are a waste of time. When the model is powerful enough to work independently and clearly articulate requirements, it becomes far more valuable than you manually writing code.

This is exactly what happened with the Claude Code team: shifting from verifying whether Claude was doing things correctly to verifying whether it was doing the right things.

Letting go feels great, but how can you trust?

Letting go may feel good, but why trust Claude?

One of its most frustrating aspects is that it’s confidently wrong. And the stronger the model, the more convincingly it outputs incorrect information, making errors harder to spot at a glance.

Claude stops when it "looks finished," but this is precisely the most dangerous signal.

Without a single working check, "looks complete" becomes Claude's only criterion, which could ultimately become your problem.

The official solution is: give it something that can determine "pass" or "fail."

For example, a set of tests, a build exit code, and screenshots comparing the results with the design mockups. It runs, performs checks, reads the results, makes adjustments, and repeats until the checks pass. This cycle becomes fully self-contained.

More importantly, in Claude Code, use /goal. Set a completion condition, and it will keep working across sessions without you needing to prompt it repeatedly.

After each round is completed, another small model evaluates the result: it’s not the same Claude doing the work, but a smaller, faster, and more cost-effective model (by default, Haiku) that reviews the completion criteria and the conversation history, then outputs either “Achieved” or “Not Achieved,” along with a brief reason. If not achieved, the process continues; if achieved, it automatically concludes.

The /goal command in Claude Code: Set completion criteria, and at each step, a small model determines whether the goal has been achieved; if not, continue working.

It looks like autonomous driving. But one thing must be clear: that small scoring model cannot execute commands or read files on its own—it can only observe the evidence presented by Claude in the conversation.

In other words, whether this cycle runs smoothly depends entirely on whether Claude presents the real thing. If the criteria are too loose or Claude merely claims “it ran,” the evaluator could still pass it.

Therefore, self-check delivery does not equal unreviewed submission.

Having the courage to let go relies on being able to see evidence at any time, not on betting that the model is smart.

One person begins to command hundreds of agents.

If the goal is to make one Claude work longer, then dynamic workflows enable a group of Claudes to work together.

The way it works is that Claude writes a JavaScript script for you that orchestrates a large number of sub-agents running in the background.

Official use cases include comprehensive code audits of entire databases, large-scale migrations involving 500 files, and research questions requiring cross-verification.

How large is the operational scale? A single run can mobilize up to 1,000 agents, with a maximum of 16 running concurrently.

The workflows constraint table in Claude Code's official documentation highlights that a maximum of 1,000 agents can be run in a single execution.

Claude Code even includes a built-in workflow called /deep-research, which specifically breaks down a question into multiple perspectives, cross-verifies the information, votes to eliminate unsupported claims, and ultimately delivers a report with citations.

This means that Claude Code is no longer just a chat box in your terminal; it is evolving into an engineering agent system that can run continuously, orchestrate tasks, and be reused.

One person can command an AI army with just a single command in /workflows.

Autonomy does not equal replacement

Fable 5 is indeed stronger.

The official statement says it can operate autonomously for longer periods than any previous Claude model, and the longer and more complex the task, the greater its advantage—but this doesn't mean programmers can completely let go.

On the contrary, official best practices consistently emphasize four tasks that must be handled by humans: defining verification standards, managing permissions, controlling context, and reviewing evidence.

It even specifically lists common failure patterns to avoid, one of which is called the "trust-then-verify gap," referring precisely to situations where Claude provides a response that appears plausible but fails to handle edge cases.

There is only one solution: you can only publish it if you can verify it; if you cannot verify it, do not publish it.

Costs and barriers cannot be avoided.

Fable 5 is priced at $10 per million input tokens and $50 per million output tokens. Its more powerful "twin," Mythos 5, is built on the same underlying model but with certain safety restrictions relaxed; it is currently available only to a select group of cybersecurity professionals and infrastructure providers.

Fable 5 also comes with an additional classifier guardrail.

When encountering sensitive topics such as cybersecurity or biochemistry, it automatically hands off the response to Opus 4.8. The official statement indicates that over 95.0% of sessions do not trigger this fallback, but the safeguards are set conservatively and may occasionally block legitimate requests.

Taking on long-term tasks doesn’t mean you can completely let go. The more autonomous the task, the more important it is to know how to verify results.

Back to Thariq’s statement: It’s time to be more ambitious.

The underlying message behind this ambition is to give you the confidence to delegate bigger challenges. But letting go still requires control—it’s more like an art that blends experience with intuition.

Rules are a starting point, not dogma.

After explaining all these rules and methods, Anthropic added one final note: they are all starting points, not dogma.

In other words, this set of best practices works well in most cases, but may not be suitable for every scenario.

Sometimes, you should keep the context accumulated because you're tackling a complex problem and that history matters; sometimes, you should skip the plan and let Claude proceed directly, because the task is inherently exploratory; sometimes, a vague prompt is exactly right, because you want to see how it interprets first before deciding whether to constrain it.

The trick is to pay attention to what works—there’s no one-size-fits-all rule.

When Claude performs well, reflect on what you did: how you wrote the prompt, what context you provided, and which mode you used; when it struggles, consider whether the prompt is too vague or the task too complex.

Slowly, you’ll develop an intuition that no guide can teach you: when to go into detail, when to leave things unsaid; when to plan, and when to let it explore—

Only then will you truly understand how to work with it.

When Fable 5 can accomplish two months of work in a single day, the most scarce skill for programmers has changed: it’s no longer about writing good code, but about defining what good code even is—and the very definition of “being able to program” is being quietly rewritten.

The most valuable engineer in the future won’t be the code supervisor, but the one who asks the best questions, sets the standards, and conducts the验收.

Reference materials:

https://www.anthropic.com/news/claude-fable-5-mythos-5

https://code.claude.com/docs/en/best-practices

https://code.claude.com/docs/en/common-workflows

https://x.com/ClaudeDevs/status/2064399512664526853