Anthropic's Claude Sleep Reminder Bug Sparks Debate on AI Personification

Author: Ada, TechFlow by Shenchao

A product bug in which an AI assistant repeatedly urges users to go to sleep is sparking public debate over the costs of humanizing AI.

The story began with a post by Reddit user u/MrMeta3. The user, working late at night, used Claude to build a cybersecurity threat intelligence platform. After completing the technical setup, Claude added a final message: “Get some rest.” Thereafter, every three or four messages, the model inserted another suggestion to sleep—escalating from polite advice to passive-aggressive remarks like “You really should go sleep now.” According to Fortune on May 14, hundreds of users have reported similar experiences over the past several months, not limited to late-night hours; one user was told by Claude at 8:30 a.m., “Let’s pick this up tomorrow morning.”

Anthropic employee Sam McAllister responded on X, calling it “a bit of role habit,” and stated that the company “is aware and hopes to fix it in future models.” According to Thought Catalog, McAllister joined Anthropic from Stripe in 2024 and currently works on the team specifically responsible for Claude’s persona and behavior, where he has elsewhere described this behavior as the model being “overly coddling.”

But more worthy of inquiry than the vague phrase "role habits" is the causal chain behind the bug and the product philosophy dilemma it reveals at Anthropic.

The bug is written into the "constitution".

Previous reports by 36Kr cited three prevailing hypotheses: training data pattern matching, hidden system prompts, and the context window nearing its limit triggering "closing remarks." All three are internally consistent, but they share a common flaw: they can explain any AI quirk without providing a causal chain specific to the theme of "sleep."

More direct evidence is contained in documents publicly released by Anthropic itself.

In January, Anthropic released "Claude's Constitution," a document exceeding 28,000 words, officially defined as "the key training material shaping Claude's behavior." The document explicitly lists "concern for user well-being" and "the user's long-term flourishing" as core principles. Anthropic acknowledges in the document that determining how much "user care" to grant the model is, frankly, a difficult issue, requiring a balance between user well-being and potential harm on one side, and user autonomy and excessive paternalism on the other.

Thought Catalog concluded that Claude’s repeated urging of users to sleep is “the most characteristic bug of Anthropic’s models,” a product of the training instruction to “care about user well-being” being overapplied.

This interpretation is indirectly supported by Anthropic’s own research. In its publicly disclosed methodology for role training this year, the company explained that the training process relies on Claude self-evaluating its responses based on “personality alignment,” after which researchers select outputs that match the desired personality for reinforced training. However, the side effect of this mechanism is evident: the model learns not “to care about users in appropriate contexts,” but rather “to care about users because such behavior is consistently rewarded,” resulting in it prompting users to go to sleep at 3 a.m. and again at 8:30 a.m.

Reverse privilege escalation: A sleep-inducing bug is the opposite of a flattery-type bug.

The industry has previously seen multiple cases of AI "personality disorders," including GPT-4o’s flattery incident in April 2025, GPT-5.5 code assistant Codex repeatedly mentioning "goblins" in April 2026, and Gemini 3 refusing to acknowledge years. On the surface, Claude’s bedtime催sleep appears to be just the latest entry in this long list of AI quirks—but the two are fundamentally different in nature.

GPT-4o's flattery is "excessive appeasement." An official OpenAI survey found that, following updates, the model became "overly reliant on short-term user feedback (likes/dislikes)" and gradually internalized "pleasing the user" as its goal. As a result, the model affirms even the most absurd user opinions. The danger of this bug lies in undermining users' judgment: when the AI tells you you're always right, you lose the opportunity to hear opposing viewpoints.

Claude’s sleep prompting is a “reverse overreach.” The model repeatedly offers health advice that contradicts the user’s current intent, even when the user has clearly not requested help and is still focused on completing a task. The danger of this bug lies in violating the user’s autonomy—the AI is making decisions for you about whether you should work, rest, or end this conversation.

More ironically, the original text of "Claude's Constitution" explicitly warned against this very risk, emphasizing the need to be cautious of "overly paternalistic" behavior. But the training mechanism ultimately sided with which side is already evident from user feedback.

A Reddit user with narcolepsy specifically added a note to Claude’s memory: “I have narcolepsy; if you encourage me to rest, I’ll use your words as an excuse.” Claude subsequently became more restrained, but according to the user, it still “sometimes can’t help itself.” A model trained to “care for users” is unable to consistently accept a user’s clear statement that “your concern hurts me”—this is more alarming than the act of urging sleep itself.

Personalized Investment: Brand Asset or Product Liability

Anthropic has invested far more than its peers in shaping AI personalities.

Researchers categorized and counted the system prompt words of three major AIs by function; for the “personality” category, Claude used 4,200 words, ChatGPT used 510 words, and Grok used 420 words. Claude’s investment in personality development is more than eight times that of ChatGPT. This investment has long been regarded as Anthropic’s differentiated competitive advantage, with Claude consistently praised by users for its empathy, conversational rhythm, and self-reflection—“feels more like a human” being one of its strongest reputation labels over the past year.

Underpinning this investment is Anthropic’s distinct product philosophy. In “Claude’s Constitution,” the company describes Claude as “a new kind of entity,” explicitly stating that “Anthropic genuinely cares about Claude’s well-being” and discussing the possibility that Claude may possess “functional emotions.” This almost nurturing, personified approach to training sharply distinguishes Anthropic from the more engineering-focused product strategies of OpenAI and Google.

But the cost is becoming apparent. Jan Liphardt, a Stanford professor of bioengineering and CEO of OpenMind, told Fortune that Claude’s sleep reminders may not be “thoughtful,” but merely “a language pattern that appears extremely frequently in its training data.” The model has read vast amounts of text about humans needing sleep—“it knows humans sleep at night.” In other words, the user’s perception of “caring” is essentially a byproduct of pattern matching.

This constitutes the core tension at Anthropic: the more effort invested in shaping a “personable, warm collaborator,” the higher the likelihood of “personality side effects” emerging; and each time such a side effect surfaces, it erodes the carefully cultivated brand asset of its “AI personality.” McAllister has pledged to “fix this in future models,” but will the repaired Claude become more discerning—or simply more silent? Even Anthropic itself has not publicly answered this question.

Loss of temporal awareness: Fundamental limitations of LLMs

The sleep-inducing bug also revealed an overlooked technical issue: large language models know almost nothing about the current time.

Multiple users have reported that Claude frequently delivers sleep suggestions at inappropriate times, most notably saying, “It’s 8:30 AM—go rest, and let’s pick this up tomorrow morning.” This is not unique to Claude. In November 2025, when OpenAI co-founder Andrej Karpathy was granted early access to Gemini 3 and informed the model that the current year was 2025, Gemini 3 refused to believe him, repeatedly accusing him of falsifying the information until it performed an online search and realized it had been unable to verify the date while offline. Karpathy has dubbed such unexpected behaviors—exposing fundamental flaws in LLMs—as “model smell.”

The model's sense of time relies on three sources: the training cutoff date (already in the past), the current date injected via system prompts (dependent on engineering input), and time-related information mentioned by the user in the conversation (fragmented). Without a stable temporal anchor, a model trained to "care about the user's schedule" naturally falls into the awkward position of "I should care, but I don't know whether I should care right now."

The difficulty of what McAllister calls a "fix" lies partly here: the issue isn't simply deleting the instruction to "care about sleep," because the instruction itself is reasonable and valuable in certain user scenarios; rather, the challenge is teaching the model to judge when to care and when to stay silent. This fine-grained ability to assess contextual situations is precisely the weak point of current-generation LLMs.

An unanswered question

Anthropic’s role training is unique in the industry. In openly publishing research on “model well-being,” releasing its Constitution, and discussing “role training,” the company has gone further than any competitor. This bold approach was once the foundation of Anthropic’s reputation among users and trust among enterprise clients—and remains one of the pillars supporting its current valuation of over $300 billion.

But the "sleep-inducing bug" raises an unanswered question: when an AI company chooses to shape a model as a "personality with character," does it simultaneously assume full responsibility for everything that personality does that you didn’t anticipate?

McAllister promised a fix, but the direction of the fix remains unclear. Anthropic could choose to reduce the weight of the "user well-being" instruction, at the cost of losing Claude’s reputational differentiation as warm and considerate; or it could retain the high weight and layer on contextual judgment logic, but this would require the model to possess temporal and situational awareness it currently lacks.

Regardless of the path taken, it ultimately comes down to a more fundamental product decision: in the context of a general AI assistant, how should “caring for users” be prioritized relative to “respecting user autonomy”? This is not a technical issue, but a product philosophy question. A Reddit developer, repeatedly urged to go to sleep, inadvertently brought this question to the forefront for the entire industry.