Microsoft Stops Internal Use of Claude Code as AI Token Costs Exceed Employee Expenses

On May 14, 2026, Microsoft began revoking internal licenses for Claude Code for most employees. The deadline is June 30—the end of Microsoft’s fiscal year.

Just six months ago, Microsoft was doing the exact opposite—back in December 2025, it made Claude Code available to thousands of employees, including engineers, product managers, and designers, encouraging everyone to reshape their workflows using vibe coding. Employees loved the tool—but perhaps too much.

But six months later, Microsoft withdrew on its own.

And almost the same week, YC partner Tom Blomfield said another thing during a batch talk: “If your API bill doesn’t hurt, you’re not burning enough.”

In the same spring, Silicon Valley is offering two completely opposite answers to the same question—Is using AI really more expensive than humans?

01 The Failure Scene of Vibe Coding

Microsoft did not cancel the Claude model. Anthropic's models will continue to be available to Microsoft employees through the Copilot CLI. What was canceled is the Claude Code product entry point itself.

The department most affected is "Experiences + Devices"—the engineering teams behind Windows, Microsoft 365, Outlook, Teams, and Surface. EVP Rajesh Jha framed this decision in an internal memo as "toolchain unification," but according to internal Microsoft sources cited by The Verge, the reality is more straightforward: employees generally find Claude Code more usable than Copilot CLI, and Anthropic’s tool has become so popular within Microsoft that the company’s own Copilot CLI has been sidelined.

In other words, Microsoft removed Claude Code not because it was inadequate, but because it was too good.

The June 30 deadline wasn’t coincidental—it’s the end of Microsoft’s fiscal year. Cutting a tool widely preferred by employees and replacing it with an in-house product, timed precisely to the fiscal year-end—everyone knows how much of this is product-driven versus financially motivated.

Microsoft is not an isolated case.

A month ago, Uber’s CTO, Praveen Neppalli Naga, revealed to The Information that the company’s entire annual AI programming tools budget for 2026 was exhausted within the first four months. Previously, Uber had even created an internal leaderboard to incentivize employees to use AI more extensively through competitions—resulting in a budget meltdown.

More bluntly, Bryan Catanzaro, NVIDIA’s Vice President of Deep Learning, told Axios: “For my team, the cost of compute far exceeds the cost of employees.” This comes from a senior executive at a hardware company whose core product is selling compute power.

Fortune connected these dots and gave the article a very Fortune-style headline: “Microsoft’s Report Reveals the Real Cost of AI—Using This Is More Expensive Than Keeping Employees.”

If you only read up to this level, the conclusion is simple: vibe coding has failed, and the story of AI replacing humans can be put to rest.

But this conclusion is too premature.

02 Copilot mode has hit a wall.

To explain Microsoft's retreat, we first need to clarify what vibe coding is.

This term was introduced by Andrej Karpathy in early 2025—he described a new programming paradigm where developers no longer write code line by line, but instead describe their intent in natural language, allowing LLMs to generate the code. Developers don’t even read the code—they simply look at the results: if it runs, they accept it; if not, they ask the AI to fix it again.

This is the most compelling productivity promise of the AI era. It means: an engineer who doesn’t know Rust can have AI write Rust for them; a product manager can have AI create prototypes for them; a designer can have AI generate runnable code for them. The very groups Microsoft opened Claude Code to in December 2025—engineers, PMs, and designers—are precisely these three types. This is no coincidence; it’s the quintessential real-world application of vibe coding.

But when vibe coding is applied in large companies, it becomes something structurally awkward.

Suppose Microsoft has an engineer earning an annual salary of $300,000. After equipping him with Claude Code, his productivity increases by 20%—this is the ideal scenario for vibe coding. But at the same time, what is his monthly token cost: $200, $500, or $2,000? This figure will steadily rise as his dependence on AI deepens.

Worse still, he won’t be fired just because he used AI—his $300,000 salary, benefits, and desk remain intact.

In other words, Microsoft’s total cost structure is “original employee salaries + new token bills.” This formula only goes in one direction—costs surge.

But does “employee output +20%” translate to “revenue +20%” financially? No. It translates to “revenue remains unchanged, but the cost structure now includes an additional AI bill”—because most employees’ increased output does not directly correspond to new revenue; writing faster doesn’t mean the company sells more.

This is the true meaning behind Catanzaro’s statement that “computing power is more expensive than employees.” It’s not saying AI is dumb—it’s saying that when you try to plug AI into the role of an existing employee, you simply can’t make the numbers work.

This logic is supported by data.

A recent Gartner forecast predicts that by 2030, the inference cost of trillion-parameter large models will decrease by nearly 90% compared to 2025. While this suggests AI is becoming cheaper, Gartner’s true conclusion is that this will not reduce enterprises’ overall AI expenses. Senior Director Analyst Will Sommer from Gartner stated: “CPOs should not conflate ‘deflation of commodity tokens’ with ‘democratization of cutting-edge inference capabilities.’”

Goldman Sachs’ prediction is more direct: by 2030, agentic AI will drive token consumption growth of 24 times, reaching 120 quadrillion tokens per month. Even with a 90% drop in per-token price and a 24-fold increase in consumption, the total bill still rises.

Jensen Huang has an even more ambitious version. Several months ago, he publicly stated that in the future, every NVIDIA employee will work alongside 100 AI agents.

Sounds great. But if you're the CFO, what do you hear? A hundred tokens being burned, 24 hours a day, nonstop.

The issue isn't that AI is too expensive. The issue is the assumption itself of giving every employee an AI co-pilot.

This posture has a popular name in the tech community—“copilot mode.” Its core assumption is that humans remain in the driver’s seat while AI sits in the passenger seat, offering suggestions. It doesn’t replace you; it simply helps you go faster.

This assumption is gently phrased on the surface—“AI won’t take your job; AI will just help you.” But financially, its implication is: your original salary remains unchanged, but an additional token fee is introduced.

Tokens are not a fixed fee; they are billed based on consumption. The more employees use, the more the company pays—exactly the type of cost structure businesses least want: variable, uncapped, and scaling inversely with productivity.

When Microsoft opened access to Claude Code in December 2025, it may not have fully realized this. Its intention was simply to let employees try it out and see how much AI could improve productivity. But six months later, employees had become truly hooked—Claude Code became wildly popular within Microsoft, resulting in token bills far exceeding what Microsoft could recoup from the increased output.

Microsoft has pulled back. But what it pulled back on isn't AI—it's the structure where employees remain in the driver's seat and AI sits in the passenger seat.

This is a structural failure. It will not disappear because the model is cheaper, nor because employees are more skilled—it will worsen as employees become more proficient with AI.

Burn tokens, not people.

Almost the same week as Microsoft’s retreat, Tom Blomfield presented a completely different perspective during YC’s batch talk. He didn’t discuss “how to use AI”—he discussed “what companies should look like in the age of AI.”

Blomfield’s assessment is straightforward: Today, most companies still operate like Roman legions—information flows upward in layers, commands are issued downward in layers, and people are the core of coordination. Adding AI to this structure is like giving firearms to Roman legionaries—they’ll use them more aggressively, but their tactics won’t change.

A true AI-native company should look different.

Blomfield used a specific description: every action should produce a recordable, callable output that makes everything legible to AI; the company should be designed as a "self-improving AI loop," where the system perceives its environment, makes decisions, invokes tools, receives feedback, and self-corrects.

In this type of company, there are only two roles: individual contributors—everyone, regardless of department, is a builder and operator, bringing prototypes to meetings, not just ideas; and DRIs (Directly Responsible Individuals)—every output has a clearly assigned owner, and “you can’t hide behind AI.”

Then Blomfield delivered the famous quote: "If your API bill doesn't hurt, you're not burning enough."

This statement would be laughed off in Microsoft’s CFO’s office, but among a room full of startup founders at YC, no one thinks it’s crazy.

Why?

Another YC partner, Diana Hu, provided the answer at Startup School in early May. She said: “What matters most isn’t headcount, but token consumption.” She also offered a more straightforward version: “One person with AI tools equals a large engineering team of the past.”

Note the keyword here: "equals." Not "is equivalent to," not "is similar to"—it's a replacement.

Among YC’s P26 2026 Spring batch, many companies are now using just five or six people to accomplish what previously required 20 or 30. Their token bills are naturally high, but their personnel costs are extremely low—overall, they’re turning a profit.

A more aggressive example is Block. The fintech company founded by Jack Dorsey recently laid off 40% of its workforce. This isn’t traditional “cost-cutting and efficiency improvement”—Block has simultaneously increased internal investment in AI tools, adopting the new structure described by Diana Hu: IC + DRI + AI agent.

In YC’s context, burning tokens is not an expense—it’s a replacement. It doesn’t replace expenses outside of AI, but rather human salaries. The books balance because the company simultaneously eliminated those positions that would have otherwise required salary payments.

This is the fundamental reason why Microsoft and YC see the same thing but give opposite answers—they are not refueling the same type of token. Microsoft’s token is providing fuel for the co-pilot, while YC’s token is replacing the original driver.

04 Real assets are being redefined

Tom Blomfield also said another more thought-provoking remark in the conversation: “People are temporary; context documents are what matter.”

This is a judgment at the accounting level.

How is a traditional company’s balance sheet structured? On the left are fixed assets, accounts receivable, goodwill, and IP; on the right are liabilities and shareholders’ equity. Employees do not appear on the asset side—they are treated as expenses. But every company knows internally that employees are the real assets: customer relationships reside in the sales team’s minds, business intuition lives in the product manager’s mind, and technical know-how resides in the engineers’ minds.

This type of "asset" has the characteristic of walking away—when employees leave, the asset leaves with them.

Blomfield describes AI-native companies as doing one thing: extracting all assets that previously existed only in human minds and turning them into AI-readable, AI-callable, and AI-iterable "contextual assets."

What does this look like in practice? It’s detailed requirement documents; process documentation capturing every decision, every email exchange, every Slack discussion; open MCP interfaces and APIs; artifacts generated by every internal tool—all of these together form a company’s new, inheritable asset layer that doesn’t disappear when employees leave.

In such companies, people become mere "variables"—easily onboarding and easily offboarding, because the company’s core assets reside not in human minds, but in documentation.

If this structure holds, it means more than just a new organizational model—it means corporate balance sheets are being rewritten. An AI-native company with just six people and a staggering token bill may appear financially unhealthy, but its true assets could be greater than those of a traditional company with sixty employees—only current accounting standards haven’t yet learned how to value these assets.

In other words, vibe coding isn't dead—it just doesn't belong in traditional companies.

The day Microsoft removed Claude Code was not a day when the AI economy failed—it was a day when the posture of fitting AI into old organizations was refuted by itself.

In that room full of startups at YC, another approach is emerging—they are small, they burn cash, they don’t track “employee AI adoption rates” on their KPI dashboards, and their CFOs don’t panic over skyrocketing token bills—because what they’re replacing isn’t “the employee’s co-pilot,” it’s the employee themselves.

In the coming years, all mid-sized companies still telling their employees to "use AI a bit more" will hit the same wall Microsoft hit—the structurally rising token bills.

But the real reason for the collision isn't that AI is too expensive—it's that organizations haven't changed.

And most companies probably won’t change anytime soon.