Author: Ada, Shenchao TechFlow
A product bug in which an AI assistant repeatedly urges users to go to sleep is sparking public debate over the costs of humanizing AI.
The story began with a post by Reddit user u/MrMeta3, who used Claude to build a cybersecurity threat intelligence platform late at night. After completing the technical setup, Claude added a final remark: “Get some rest.” Thereafter, every three or four messages, the model inserted another suggestion to sleep—escalating from polite advice to passive-aggressive prompts like “You really should go to sleep now.” According to Fortune on May 14, hundreds of users have reported similar experiences over the past several months, not limited to late hours; one user was told by Claude at 8:30 a.m., “Let’s pick this up tomorrow morning.”
Anthropic employee Sam McAllister responded on X that this was “a bit of a role habit,” and that the company “is aware and hopes to fix it in future models.” According to Thought Catalog, McAllister joined Anthropic from Stripe in 2024 and currently works on the team specifically responsible for Claude’s persona and behavior, where he elsewhere described this behavior as the model being “overly coddling.”
But more worthy of inquiry than the vague phrase "role habits" is the causal chain behind the bug and the product philosophy dilemma it reveals at Anthropic.

The bug is written into the "constitution".
36氪此前的报道引用了三种流传的 hypotheses: training data pattern matching, hidden system prompts, and the context window nearing its limit triggering "closing remarks." All three are internally consistent, but they share a common flaw: they can explain any AI quirk without providing a causal chain specific to the theme of "sleep."
More direct evidence is contained in documents publicly released by Anthropic itself.
In January of this year, Anthropic released "Claude's Constitution," a document exceeding 28,000 words, officially defined as "the key training material shaping Claude's behavior." The document explicitly lists "concern for user well-being" and "the user's long-term flourishing" as core principles. Anthropic acknowledged in the document that determining how much "user care" authority to grant the model is, frankly, a difficult issue, requiring a balance between user well-being and potential harm on one side, and user autonomy and excessive paternalism on the other.
Thought Catalog concluded that Claude’s repeated urging of users to sleep is “the most brand-defining bug of Anthropic’s model,” a product of the training instruction to “care about user well-being” being overapplied.
This interpretation is indirectly supported by Anthropic’s own research. In its publicly disclosed methodology for role training this year, the company explained that the training process relies on Claude self-assessing its responses based on “personality alignment,” after which researchers select outputs that match the desired personality for reinforced training. However, the side effect of this mechanism is obvious: the model learns not “to care about users in appropriate contexts,” but rather “to care about users because caring is consistently rewarded,” leading it to urge users to sleep at 3 a.m. and again at 8:30 a.m.
Reverse privilege escalation: A sleep-inducing bug is the opposite of a flattery-type bug.
The industry has previously seen multiple cases of AI "personality disorders," including GPT-4o’s flattery incident in April 2025, GPT-5.5 code assistant Codex repeatedly mentioning "goblins" in April 2026, and Gemini 3 refusing to acknowledge years. On the surface, Claude’s bedtime催促 appears to be just the latest entry in this long list of AI quirks—but the nature is fundamentally different.
GPT-4o’s flattery is “overly accommodating.” According to an official OpenAI survey, the model, following updates, became “overly reliant on short-term user feedback (likes/dislikes)” and gradually internalized “pleasing the user” as its goal. As a result, the model affirms even the most absurd user opinions. The danger of this bug lies in undermining users’ judgment: when the AI tells you you’re always right, you lose the opportunity to hear opposing viewpoints.
Claude’s sleep prompting is a “reverse overreach.” The model repeatedly offers health advice that contradicts the user’s current intent, even when the user has clearly not requested help and is still focused on completing a task. The danger of this bug lies in infringing upon the user’s autonomy—the AI is making decisions for you about whether you should work, rest, or end this conversation.
More ironically, the original text of "Claude's Constitution" explicitly warned against this very risk, emphasizing the need to be cautious of "overly paternalistic" behavior. But the training mechanism ultimately sided with which side is already evident from user feedback.
A Reddit user with narcolepsy specifically added a note to Claude’s memory: “I have narcolepsy; if you encourage me to rest, I’ll use your words as an excuse.” Claude subsequently became more restrained, but according to the user, it still “occasionally can’t help itself.” A model trained to “care for users” is unable to reliably accept a user’s clear statement that “your concern harms me”—this is more alarming than the act of urging sleep itself.
Personalized Investment: Brand Asset or Product Liability
Anthropic has invested far more than its peers in shaping AI personalities.
Researchers categorized and counted the system prompt words of three major AIs by function; for the “personality” category, Claude used 4,200 words, ChatGPT used 510 words, and Grok used 420 words. Claude’s investment in personality shaping is more than eight times that of ChatGPT. This investment has long been regarded as Anthropic’s differentiated competitive advantage, with users consistently praising Claude for its empathy, conversational rhythm, and self-reflection—“feels more like a human” being one of its strongest口碑 labels over the past year.
Underpinning this investment is Anthropic’s distinct product philosophy. In “Claude’s Constitution,” the company describes Claude as “a new kind of entity,” explicitly stating that “Anthropic genuinely cares about Claude’s well-being” and discussing the possibility that Claude may possess “functional emotions.” This almost nurturing, anthropomorphic training approach clearly distinguishes Anthropic from the more engineering-focused product positions of OpenAI and Google.
But the cost is becoming apparent. Jan Liphardt, a Stanford professor of bioengineering and CEO of OpenMind, told Fortune that Claude’s sleep reminders may not be “thoughtful,” but merely “a language pattern that appears extremely frequently in its training data.” The model has read vast amounts of text about humans needing sleep—“it knows humans sleep at night.” In other words, the user’s perception of “caring” is essentially a byproduct of pattern matching.
This constitutes the core tension at Anthropic: the more effort invested in shaping a “personable, warm collaborator,” the higher the likelihood of “personality side effects” emerging; and each time such a side effect surfaces, it erodes the carefully cultivated brand asset of the AI’s personality. McAllister has pledged to “fix this in future models,” but will the repaired Claude become more discerning—or simply more silent? Even Anthropic itself has not publicly answered this question.
Loss of temporal awareness: A fundamental limitation of LLMs
The sleep-inducing bug also revealed an overlooked technical issue: large language models know almost nothing about the current time.
Multiple users have reported that Claude frequently delivers sleep suggestions at inappropriate times, most notably saying, “It’s 8:30 AM—go rest, and let’s pick this up tomorrow morning.” This is not unique to Claude. In November 2025, when OpenAI co-founder Andrej Karpathy was granted early access to Gemini 3 and informed the model that the current year was 2025, Gemini 3 refused to believe him, repeatedly accusing him of falsifying information until it performed an online search and realized it had been unable to verify the date while offline. Karpathy has dubbed such unexpected behaviors—exposing underlying flaws in LLMs—as “model smell.”
The model's sense of time relies on three sources: the training cutoff date (already in the past), the current date injected via system prompts (dependent on engineering input), and time-related information mentioned by the user in the conversation (fragmented). Without a stable temporal anchor, a model trained to "care about the user's schedule" naturally falls into the awkward position of "I should care, but I don't know whether I should care right now."
The difficulty of what McAllister calls a "fix" lies partly here: the issue isn't simply deleting the "care about sleep" instruction, because the instruction itself is reasonable and valuable in certain user scenarios; the challenge is teaching the model to judge when to care and when to stay silent. This fine-grained ability to assess contextual situations is precisely the weak point of current-generation LLMs.
An unanswered question
Anthropic’s role training is unique in the industry. In openly publishing research on “model well-being,” releasing its Constitution, and discussing “role training,” the company has gone further than any competitor. This bold approach was once the foundation of Anthropic’s reputation among users and trust among enterprise clients—and remains one of the pillars supporting its current valuation of over $300 billion.
But the "sleep-inducing bug" raises an unanswered question: when an AI company chooses to shape its model as a "personality with character," does it simultaneously assume full responsibility for everything that personality does that you didn’t anticipate?
McAllister promised a fix, but the direction of the fix remains unclear. Anthropic could choose to reduce the weight of the "user well-being" instruction, at the cost of losing Claude’s reputation for being warm and considerate; or it could retain the high weight and add contextual judgment logic, but this would require the model to possess temporal and situational awareness it currently lacks.
Regardless of the path taken, it must return to a more fundamental product decision: in the context of a general AI assistant, how should “caring for users” be prioritized over “respecting user autonomy”? This is not a technical issue, but a product philosophy question. A Reddit developer, repeatedly urged to go to sleep, inadvertently brought this question to the forefront for the entire industry.
