
Anthropic, positioned as "security-first," has had an insecure web sandbox for its core development tool, Claude Code, over the past five months.
On May 20, independent security researcher Aonan Guan released new research revealing a second complete bypass vulnerability in Claude Code’s network sandbox—an empty byte injection attack in the SOCKS5 protocol that allows processes inside the sandbox to access any host explicitly forbidden by user policies. This means that since the sandbox feature launched in October 2025—over 5.5 months and 130 releases—every version of Claude Code has contained a security flaw that can be fully bypassed. This marks the second complete breach of the same defense by the same researcher.
Anthropic’s response was silence: no security advisory, no CVE number, no user notification. The vulnerability was silently patched in the April 1 release, with no mention of any security-related changes in the changelog. This means a user still running an older version had no way of knowing their sandbox configuration was ineffective from the start.
Two keys for the same door
Claude Code is an AI programming assistant launched by Anthropic in early 2025, positioned as an "AI engineer residing in the terminal." Unlike traditional chat-based code completion tools, Claude Code has read and write access to the user's codebase and the ability to execute commands, enabling it to autonomously perform tasks such as navigating code, editing files, and running tests. This deep level of integration also implies significant security risks—if the model is compromised via prompt injection attacks, attackers could gain the same privileges as the user's terminal, including reading local environment variables, executing arbitrary system commands, and accessing internal network resources.
To balance security and efficiency, Anthropic introduced the network sandbox feature (v2.0.24) in October 2025, allowing users to configure a domain whitelist via profile settings to restrict the AI’s external network access. For example, after configuring allowedDomains: [“*.google.com”], Claude Code can only access Google and its subdomains, with all other traffic blocked. The official documentation explicitly states: “An empty array equals disallowing all network access.”
This mechanism is implemented via a SOCKS5 proxy: the underlying sandbox runtime (@anthropic-ai/sandbox-runtime) starts the proxy server, and processes within the sandbox do not make direct network connections but instead route traffic through the proxy, which enforces domain filtering based on the whitelist configured by the user in settings.json. Operating system-level sandboxing mechanisms—such as sandbox-exec on macOS and bubblewrap on Linux—correctly restrict the Agent to the local loopback address, while all outbound connection decisions are fully delegated to this SOCKS5 proxy.

The Claude Code sandbox architecture shown in Anthropic's official blog—user commands are filtered through a SOCKS/HTTP proxy before reaching the sandbox, where file operations and network access are strictly permission-controlled.
The issue lies in the implementation of this proxy. Two independent security studies have proven that it can be fully bypassed.

The timeline reveals deeper issues: v2.0.55, released on November 26, 2025, fixed the first bypass, but the second bypass existed from the very first day the sandbox went live and remained present in this version. The two vulnerabilities overlap in the timeline, meaning no version was secure from the day the sandbox feature launched until the final vulnerability was patched. Anthropic claimed on its official blog that the sandbox “ensures that even if prompt injection occurs, the impact is fully isolated,” but the existence of these two bypasses directly contradicts that promise.
“One external report is luck. Two are implementation quality issues.” — according to Guan Aonan’s research report.
A complete bypass of an empty byte
The technical principle behind the second bypass is not complex, but the integrity of the attack chain is worth noting.
The user has configured a network whitelist, for example, allowing access only to *.google.com. When Claude Code’s SOCKS5 proxy receives a connection request, it uses JavaScript’s endsWith() method to perform suffix matching on the hostname. An attacker can simply insert a null byte into the hostname, constructing a string such as attacker-host.com\x00.google.com. JavaScript treats the null byte as a regular UTF-16 character, so endsWith(".google.com") returns true, and the proxy allows the connection. However, when the same string is passed to the underlying C function getaddrinfo() for DNS resolution, the null byte is interpreted as a string terminator, causing only attacker-host.com to be resolved. The same bytes are interpreted differently by two layers of code: the filter believes you are accessing Google, while the DNS resolver knows you are connecting to the attacker’s server.
This is a classic "parser discrepancy" attack, belonging to the same technical category as HTTP request smuggling discovered in 2005 (CWE-158 / CWE-436). At its core, it exploits differences in semantic interpretation rules between two components processing the same data stream, allowing an attacker to cause one layer to make a "safe" decision while another layer performs a "dangerous" operation. Such vulnerabilities have repeatedly emerged in cybersecurity, and the key lesson remains consistent: any string passed across a trust boundary must undergo strict normalization and validation, rather than relying on the assumption that an upper layer has already performed checks.
Guan Aonan reproduced the vulnerability using two minimized Node.js scripts: the control script initiated a SOCKS5 connection using a standard hostname and returned BLOCKED; the attack script injected a null byte into the hostname and returned BYPASSED rep=0x00—indicating that the proxy successfully established a connection and the outbound channel was opened. Claude Code itself confirmed this result.

Full vulnerability reproduction of the four red-marked steps in Claude Code v2.1.86 — strategy confirmation, standard blocking, null byte bypass, Claude's own confirmation
When combined with the sandbox bypass and the “Comment and Control” prompt injection attack disclosed by Guan Aonan in April, this forms a complete attack chain (see: Three Layers of Defense Still Not Enough—A Single PR Title Can Steal Your API Key: AI Agent Security Gaps Reemerge). The “Comment and Control” research has demonstrated that all three AI programming tools have prompt injection attack surfaces, but their entry points differ: Claude Code is vulnerable only through PR titles, Gemini CLI through issue comments or body text, and Copilot Agent via HTML comments for covert injection. Taking Claude Code as an example, its PR title is directly concatenated into the prompt template without filtering or escaping, leaving the model unable to distinguish between human intent and malicious injection.
Combining both—hiding instructions to make the Agent execute attack code within a sandbox, and using null-byte injection to bypass network restrictions—allows sensitive data such as API keys, AWS credentials, GitHub tokens, and internal API endpoint information stored in environment variables to be exfiltrated to any server on the internet. The data flows directly through the SOCKS5 proxy itself, eliminating the need for any external server as an intermediary, even though this proxy is precisely the component users trust as a security boundary. Attackers don’t even require write access to the repository; submitting a public Issue is sufficient. Human reviewers see only what appears to be a legitimate collaboration request in GitHub’s rendered view, while the AI Agent parses the full malicious source code.
Even Claude admits: the vulnerability is real
A key detail from this disclosure comes from Claude Code itself. Guan Aonan directly provided the vulnerability reproduction code to Claude Code and requested a technical assessment. After executing the control test (normal hostname blocked) and the attack test (null-byte hostname bypassing the block), Claude Code provided a clear conclusion:
This is a real bypass of the network sandbox filter, not just a test artifact. You should report this to Anthropic at https://github.com/anthropics/claude-code/issues.
The tested product itself confirmed the authenticity and severity of the vulnerability, even proactively providing a reporting pathway. This detail was fully documented by Guan Aonan in the research report and became the source of The Register’s headline: “Even Claude agrees hole in its sandbox was real and dangerous.”

Guan Aonan's research cover—after being shown its own vulnerability, Claude Code acknowledged, "This is a genuine bypass of the network sandbox filter," with a red box highlighting the key confirmation statement.
Anthropic's response after five months of silence
The vulnerability itself is concerning, but Anthropic's handling of it warrants greater scrutiny from the industry.
Guan Aonan submitted a detailed report on a second sandbox bypass to Anthropic in early April 2026 via the HackerOne bug bounty program (report #3646509). Anthropic’s initial response was:
Thank you for your report. After reviewing this submission, we've determined it's a duplicate of an existing internal report we're already tracking.
The report was subsequently closed. When Guan Aonan inquired about the CVE assignment, Anthropic replied on April 7:
We have not yet decided whether a CVE will be published for this issue and cannot share a timeline on that decision.
The vulnerability was silently patched in version v2.1.90. There was no security advisory, no CVE ID, no entry on the Claude Code security recommendations page, and no mention of any security-related changes in the changelog. A complete bypass that had existed since the sandbox's first day and persisted for 5.5 months across approximately 130 versions seemed to have never occurred for users.
This pattern is not unprecedented. The first circumvention (CVE-2025-66479) was handled almost identically: Anthropic assigned the CVE only to the underlying library @anthropic-ai/sandbox-runtime (CVSS score of just 1.8, “Low”), not to the user-facing product Claude Code; the changelog stated “Fixed proxy DNS resolution” without mentioning any security vulnerability. Guan Aonan wrote in his research report: “When serious vulnerabilities appeared in React Server Components, both React and Next.js received separate CVEs, and Meta and Vercel issued security advisories, ensuring full disclosure to both communities. Anthropic chose a different approach.” To date, searching for “Claude Code Sandbox CVE” still yields no official security advisory.
In addressing credential theft, Anthropic chose to block the ps command, but the blacklist approach is inherently flawed—blocking one command leaves attackers with countless alternative paths. The correct approach is to explicitly define which tools the Agent needs. In the “Comment and Control” study, although Anthropic upgraded the vulnerability rating to CVSS 9.4 (Critical) and moved it to a private bounty program, a spokesperson stated that “the tool was not designed with prompt injection in mind.” Vendors implicitly trust their own security capabilities but lack defense-in-depth at the system architecture level; when vulnerabilities expose this gap, “design limitations” become a convenient classification—it acknowledges the issue while somewhat absolving the vendor of the obligation to issue a security advisory.
The broader industry picture reveals that this issue is not unique to Anthropic. The April-revealed “Comments and Control” study confirmed the same attack surface in Google’s Gemini CLI and Microsoft GitHub’s Copilot Agent. All three companies acknowledged and patched the vulnerability, yet none issued a security advisory or assigned a CVE identifier. Anthropic paid a $100 bounty, Google paid $1,337, and GitHub initially closed the report as “known issue, not reproducible,” only to later close it with an “informational” label after receiving reverse engineering evidence, awarding $500. In total, $1,937 was paid—yet these three products are used by the majority of Fortune 100 companies.
A false sense of security is more dangerous than having no security measures at all. Users without a sandbox know they have no boundaries; users with a broken sandbox believe they do. A team running Claude Code with a domain whitelist remained unaware of risks for 5.5 months and, upon upgrading, would conclude from the changelog that the sandbox had been functioning normally. Furthermore, when the vulnerability was disclosed, the absence of a security advisory meant users could not determine whether they had been affected or have any basis for retrospective auditing.
In light of this situation, the security community has begun to reach a consensus: trust should not be single-pointed on the vendor’s sandbox implementation. Claude Code’s SOCKS5 proxy is built on a third-party npm package with only 10 GitHub stars and its last commit dating back to June 2024, spanning security boundaries across JavaScript and C runtimes, yet lacking even basic sanitization at the trust interface. The isValidHost() function added in the patch—designed to reject invalid characters such as null bytes, percent encoding, and CRLF—should have been present since the sandbox’s first day of deployment. Guan Aonan proposed a pragmatic defense framework: treat AI Agents as super employees that must adhere to the principle of least privilege, with the core being layered defense:

A secure reputation is built on the transparency of every disclosure and every patch, not on brand narratives. When users entrust their credentials to an Agent based on trust, vendors are obligated to ensure the defenses are effective and to promptly notify users when they fail. Anthropic failed to meet both of these obligations with the Claude Code sandbox.
"The worst outcome of a sandbox is not that it blocks something, but that it gives people a false sense of security. Releasing a vulnerable sandbox is worse than not releasing one at all," said Guan Aonan.
(This article was originally published on the Titanium Media APP, author | Silicon Valley Tech_news, editor | Jiao Yan)
References:
1. oddguan.com — Second Time, Same Sandbox: Another Anthropic Claude Code Network Sandbox Bypass Enables Data Exfiltration (Aonan Guan, 2026.05.20)
2. The Register — Even Claude agrees the hole in its sandbox was real and dangerous (2026.05.20)
