Anthropic Faces Government Pushback Over Fable AI Model

Author: Ben Thompson

Compiled by Deep潮 TechFlow

Shenchao Overview: Anthropic’s new model, Fable, was abruptly halted by the U.S. government just two months after its release—officially due to a “security breach,” but in reality exposing a dual conflict between AI labs and both the government and the software industry. A company that markets itself on “safety” is turning the safety narrative into a business moat, while its true target is the user data held by companies like Microsoft.

I understand the skeptics’ position—that they believe Anthropic’s public statements, particularly those made when releasing models, are designed to spread fear for marketing purposes. Two months ago, Anthropic announced the Mythos Preview, claiming the model was too dangerous to release publicly, especially due to its powerful cybersecurity capabilities. Then, two months later, the company publicly released Fable, a version of Mythos with various safety safeguards.

Based on my limited experience, Fable is truly an exceptional model. It’s now difficult to objectively evaluate models beyond their programming performance, but subjectively, I find the interaction with Fable to be outstanding—it makes other models, including GPT-5.5 and Opus-4.8, seem small and simplistic by comparison. I’ve had this feeling only twice before: once with GPT-4 and once with Grok-4, both of which represented a new generation of foundational models in terms of scale and complexity. I believe Fable stems from a new pre-training and is the first of this new generation.

Therefore, I fully accept that Fable/Mythos is indeed stronger in identifying and exploiting security issues, making Anthropic’s cautious rollout reasonable. However, the problem with publicly releasing the model is that safeguards can be jailbroken—clearly, this happened shortly after the release.

Anthropic once again confronts the U.S. government

What happened next is somewhat unclear. Anthropic wrote in a blog post:

The U.S. government has invoked national security authorities to issue an export control directive suspending all foreign nationals’ access to Fable 5 and Mythos 5, both within and outside the United States, including Anthropic’s foreign employees. As a practical effect of this order, we must immediately disable access to Fable 5 and Mythos 5 for all customers to ensure compliance. Access to all other Anthropic models remains unaffected.

We received the government’s directive at 5:21 PM Eastern Time today. The letter did not provide specific details regarding national security concerns. We understand the government believes a method has been discovered to bypass or "escape" Fable 5. We reviewed a demonstration that used this specific technique to identify a small number of known minor vulnerabilities. These vulnerabilities all appeared relatively simple, and we found that other publicly available models could detect them without requiring a bypass.

Anthropic further argues that non-general jailbreaks are inevitable and limited in scope, with no evidence of general jailbreaks existing; the jailbreaks that have been discovered appear to be those reported by Amazon, which is notable because Amazon is both an investor in Anthropic and a major provider of the company’s inference services. At the time of writing this article, Anthropic’s executives were in Washington, D.C., attempting to address what they insist is a misunderstanding, but which White House officials have suggested reflects a disregard by the company’s leadership for legitimate national security concerns.

Given the numerous disputed facts, I have little to add regarding the current conflict; however, I am not surprised that the conflict is taking place: I explained in my article “Anthropic and Alignment” that a clash between the U.S. government and Anthropic was inevitable. In this regard, those who believe Mythos is not yet powerful enough to warrant aggressive government action are missing the point: if it isn’t powerful enough now, the next one will be—or the one after that—especially as models are becoming increasingly useful at creating their successors.

However, this raises another question—one that seems to validate the skeptics’ point: If Mythos is so dangerous, why release Fable in the first place, and why defy the government to do what you claim you want to do? In fact, I find Anthropic’s actions entirely understandable; what sets the company apart is how it justifies these actions—justifications that simultaneously fuel the skeptics and grant Anthropic its magic.

Economic necessity

In the early years of AI, the greatest economic value flowed to compute, for obvious reasons: insufficient supply to meet demand led to soaring prices; the biggest beneficiaries were NVIDIA, TSMC, and memory manufacturers (SK Hynix, Samsung, and Micron). Meanwhile, Anthropic and OpenAI collectively lost tens of billions of dollars building cutting-edge models, which, upon release, were distilled and commoditized by open-source models, primarily from China.

This represents the lab’s pessimistic scenario—where they can never cover costs because their differentiation is fleeting and free alternatives become “good enough”—and I find this reasonable. In a world where models are interchangeable, models become commodities, and most value flows elsewhere. Right now, it’s compute power, but over time, as we have sufficient compute, the most valuable position in the value chain will remain what it has always been: owning user touchpoints.

Therefore, it has always been clear to me that frontier labs have an economic imperative to get closer to users. If you own the user touchpoint, you have meaningful lock-in, and the best way to own the user touchpoint is to become the canvas for everything they need to do. This, in turn, means frontier labs are increasingly in conflict with software companies: software owns the user touchpoint, but the long-term interest of frontier labs is not simply to become a commodity input for software, but to directly replace it.

Meanwhile, software companies are striving to do the opposite. Satya Nadella outlined his vision for how companies should build on models in a post on X:

Every company must build what I call human capital and token capital. Human capital encompasses its employees’ knowledge, judgment, relationships, originality, and pattern recognition, while token capital refers to the AI capabilities a company builds and owns. Crucially, as token capital grows, human capital does not become less valuable—it becomes more valuable! I believe human initiative will be the driving force behind the growth of token capital. Humans will set ambitious goals, connect dots across domains, build relationships, and identify the most important patterns. Without human guidance, your computing power is spinning its wheels.

This means the real opportunity lies not in selecting the best model, but in building learning loops on top of models that allow human and token capital to compound. You can outsource a task, or even a job—but you can never outsource your learning. The future of companies lies in enabling this learning to compound between humans and AI. This requires a new architectural approach that allows every business to build agent systems that improve over time while retaining control over their intellectual property. Companies should be able to swap out “general-purpose” models without losing the institutional knowledge embedded in their learning systems. This is the key “test” of your control and sovereignty in the coming era.

Nadella opened this vision with a warning:

What we do not want to see is a world where every company in every industry gives up its value to a handful of all-consuming models. If all value is captured by only a few models, the political economy simply would not tolerate it. Society will not grant permission for an AI future that empties entire industries.

Think about what happened during the first phase of globalization: the entire industrial economy was outsourced and hollowed out. On the surface, GDP numbers looked good, but displacement was real, and its consequences are still felt today. Let’s not bring this dynamic into the AI era, where a handful of AI systems capture all the economic returns while entire industries find their knowledge commodified right under their noses.

The problem with this analogy is that globalization has indeed occurred, and industrial economies have indeed been hollowed out. This may not be a warning but a prophecy; no wonder Nadella is raising the alarm, as Microsoft could be one of the victims. Similarly, the economic imperative for model manufacturers is precisely to achieve this.

Data necessity

These models—even Mythos—have not reached that point yet. Beyond more computing power, they need more and better data. Model improvements are increasingly driven by reinforcement learning; some of this data can be synthetically generated, but the most powerful leverage for leading labs remains real-world usage.

I believe this is the main reason both OpenAI and Anthropic offer heavily subsidized subscription plans. SemiAnalysis recently estimated that a $200 plan gives you access to Claude tokens worth $8,000 and Codex tokens worth $14,000. While both are competing for user and developer mindshare, they are also vying for access to actual usage data to improve their models.

Anthropic has significantly increased its data retention on Fable, announcing that it will retain all used data for 30 days—even for enterprise plans previously promised zero data retention. The company states it will not use this data for training, but it has implemented no safeguards to guarantee it won’t do so in the future (such as storing data with a third party). If this policy change (when Fable resumes) doesn’t lead to significant customer churn, I suspect it’s only a matter of time before they begin using the data: it’s too valuable to their ultimate goals.

Also note the positive feedback loop with moving up to user touchpoints: the more workflows completed directly with Claude or Codex, the more data each company can feed back into training, making their products more powerful and useful, expanding the number of workflows they can serve, and increasing their access to data.

Nadella emphasized the importance of this data in the article, but naturally believes it should be independent of the model:

The company needs to transform its workflows, domain expertise, and accumulated judgment into an AI system that improves with each use. Private evaluations should capture whether the model is truly improving on outcomes that matter to the business (not just external benchmarks!). A private reinforcement learning environment should strengthen the model on real internal trajectories within the organization. Its knowledge base makes institutional memory queryable and uses tokens more efficiently.

This cycle has become the company’s new intellectual property. I view it as a climbing machine. Unlike most assets, it is compounding: each improved workflow generates better training signals, accelerating the accumulation of the company’s unique implicit knowledge. The company that builds this early will have a hard-to-replicate advantage, regardless of any new individual model capabilities.

This cycle has become the company’s new intellectual property. I view it as a mountain-climbing machine. Unlike most assets, it compounds over time. Each improved workflow generates better training signals, accelerating the accumulation of the company’s unique tacit knowledge. Companies that establish this capability early will gain a hard-to-replicate advantage, regardless of how much individual model capabilities improve in the future.

However, what if companies that comply with Anthropic’s data policies are already achieving better results? Or what if existing companies resist, creating an opportunity for new entrants—or even the model manufacturers themselves—to outcompete them in the market? Anthropic is indeed testing the resolve that Nadella has called for.

Claims of authority

Even the data retention policy surrounding Fable/Mythos isn’t the most controversial aspect of the release. Instead, Anthropic stated in its announcement that if Fable is used for LLM development, its performance will be quietly degraded; the system card reads:

We have also implemented additional safeguards related to cutting-edge LLM development. As discussed in Section 6.1 of our February 2026 Risk Report, we are concerned about the risks associated with accelerating the overall pace of AI development, although we remain uncertain about the severity of these risks. In particular, our concern—as we wrote at the time—is that "accelerating other AI developers' efforts to build powerful AI systems similar to ours, without necessarily implementing corresponding safeguards."

Given that recent models have gained the ability to accelerate their own development, we have implemented new safeguards to limit Claude’s effectiveness in responding to requests related to cutting-edge LLM development—such as building pre-training pipelines, distributed training infrastructure, or ML accelerator design. While using Claude to develop competing models already violated our Terms of Service, these protective measures enforce this restriction and help prevent accelerating those most inclined to violate these terms.

Unlike our interventions in cybersecurity, biochemistry, and distillation efforts, these safeguards are invisible to users. Fable 5 will not revert to another model. Instead, the safeguards will limit effectiveness through methods such as prompt modification, directional vectors, or parameter-efficient fine-tuning (PEFT). These interventions will not impact the vast majority of programming tasks. We estimate they will affect approximately 0.03% of traffic, concentrated in fewer than 0.1% of organizations. When these interventions are active, we expect their impact on model behavior to be negligible beyond restricting the effectiveness of its frontier LLM development. Claude will still provide helpful responses to user requests. We will continue improving the precision of our detection methods after this model’s release.

Anthropic retracted this change—Fable will route LLM-related requests to Opus 4.8 and disclose this routing to users—but I believe the original policy was highly revealing. On one hand, I don’t blame Anthropic for not wanting to assist competitors; on the other hand, it should be abundantly clear that Anthropic believes no one besides themselves should be building cutting-edge LLMs.

What makes this policy particularly striking is that it was enacted just two months after Anthropic clashed with the Department of Defense: the latter sought to use Claude for any lawful purpose, while Anthropic aimed to impose stricter controls on surveillance and autonomous weapons. This downgrade reflects both Anthropic’s ability and its willingness to quietly modify its models to align with its policy preferences. In other words, Anthropic has actively validated some of the biggest concerns raised by critics regarding its role as a supply chain risk.

However, the broader conclusion from that incident is that Anthropic believes it should have final say over how Anthropic is used; given their view that only they should develop cutting-edge AI, they effectively believe only they should have ultimate control over AI as a whole. When you combine this understanding with the company’s claim that AI can perform all economic activities, you realize that Anthropic’s leadership essentially seeks power over everything and everyone.

Security narrative

Of course, Anthropic would never state it so directly; instead, the story is about safety:

I anticipate that Anthropic will increasingly expose its model capabilities to end users through increasingly workflow-specific endpoints, even as it begins to restrict API access. This shift toward software substitution and access limitations will be justified in the name of security, even as Anthropic seeks to fulfill its economic interests in engaging directly with end users.

Anthropic cites security as the reason for its significant change to its data retention policy. Specifically, the company claims that retaining all user data for 30 days is necessary to prevent jailbreaking, a concern raised by the U.S. government. I can certainly envision a future in which security concerns compel them to also train on this data to better guard against malicious use.

The entire origin story of Anthropic is rooted in the founders' belief that OpenAI was not taking safety seriously enough; the company felt that only they could control AI, and because they uniquely cared about safety, they had justification to try to control everyone else, including the U.S. government.

Regarding these security justifications, the issue is that I find them valid because, for Anthropic, they aren’t justifications at all. The company genuinely believes it is the only one that takes superintelligence seriously—and thus the only one adequately concerned about the risks. This excuses one decision after another, one policy after another, one confrontation after another, which to outsiders appear as a strange blend of cynicism and naivety.

The contrast with OpenAI is stark: I believe one way to understand how and why OpenAI lost its lead is that, in the years following the release of ChatGPT, the company was internally at war, as its once-research-focused lab suddenly bore the heavy burden of becoming an accidental consumer tech company; during OpenAI’s efforts to resolve this conflict, it lost significant talent to companies like Anthropic.

On the other hand, Anthropic has perfect alignment between its talent, mission, and business. The company can pitch to researchers the vision of creating a machine god, wrapped in the aura of people who care about dangers and are smart enough to represent humanity in addressing them; and every resulting policy change happens to benefit the business—a wonderful coincidence indeed.

I both respect and fear this consistency. I respect it because it is clearly highly effective; the closest analogy might be Apple, which has always packaged every self-serving action under the guise of doing what’s best for users—and they often are right. Anthropic operates the same way. Yet, what I fear is this: it’s one thing to let people convinced they know best build a smartphone I can accept or reject; it’s far more alarming to let them build superintelligences capable of rivaling or surpassing the power of nation-states—or even large corporations. The history of brilliant people convinced they know what humans need is dirty, precisely because their belief in their own good intentions has justified actions that were, in fact, anything but.