By Lin Wanwan

In 1876, at the Philadelphia World’s Fair, Emperor Pedro II of Brazil picked up Bell’s telephone, heard the voice on the other end, and exclaimed: “My goodness, it talks!”

One hundred and fifty years later, on March 18, 2026, at the San Jose Convention Center, Jensen Huang, wearing a black leather jacket, stood on the GTC stage and said something astonishing.

In ten years, NVIDIA will likely have 75,000 employees. They will be extremely, extremely busy working alongside 7.5 million AI agents.

The audience laughed.

75,000 people, 7.5 million agents, 1:100.

Huang also laughed and added, “They’ll work around the clock. Hopefully, our people don’t have to compete with them.”

The applause faded, and this number was drowned out by flashier chip launches and partnership announcements that day. But let’s take a moment to consider it on its own—this could be one of the most important statements of the entire conference.

It's not just Jensen Huang. Three months ago, another person described this same future in even greater detail.

January 2026, CES in Las Vegas. McKinsey CEO Bob Sternfels sits on stage and states the numbers.

We currently have 40,000 human employees and approximately 25,000 AI agents. Less than two years ago, these numbers were in the thousands. Those 25,000 agents generated 2.5 million charts over the past six months.

2.5 million charts. Used to be the job of newly hired analysts—twenty-three or twenty-four years old, boasting degrees from top global universities, aligning axes at 3 a.m.

That’s where every McKinsey new hire begins, trading the most mechanical labor for a ticket to the partnership track.

Now, the first half of this ticket has been taken over by the agent. Sternfels says: AI has increased some roles by 25% and reduced others by 25%. The company has been cleanly split in half—half expanding, half contracting.

The story of NVIDIA and the story of McKinsey are about the same thing.

In a 1:100 world, workers are token-driven agents, and humans are interfaces connected to those agents.

The remote control for the cheat is not in your hands.

During GTC week, Jensen Huang appeared on the All-In Podcast and said something even more impactful.

Suppose you have an engineer earning $500,000 per year. If they haven’t consumed at least $250,000 worth of tokens, I would be very concerned.

The host pressed further on whether NVIDIA was spending $2 billion to buy tokens for its engineering team, and Jensen Huang replied, “We are working on it.”

An engineer who doesn't burn tokens isn't worth 500,000, even if they're paid 500,000.

NVIDIA's approach is straightforward: include tokens in compensation packages. At the GTC keynote, Jensen Huang said that in the future, every NVIDIA engineer will receive an annual token budget equivalent to about half of their base salary.

An engineer with a base salary of hundreds of thousands of dollars receives an additional allocation of inference computing power equivalent to half their base salary, with one-third of the total package consisting purely of fuel.

Someone with a full token budget has the equivalent of a dozen AI agents working around the clock to write code, run tests, search literature, and run simulations. Someone limited to the free API tier is still typing everything by hand. Their resumes might be identical, but their output differs by 5 to 10 times.

This is no longer theoretical in Silicon Valley.

In March this year, Business Insider reported a shift: engineers are now being asked during interviews, “What is the token budget for this role?” Tomasz Tunguz, partner at Theory Ventures, calls token allocation the “fourth pillar” of engineer compensation, following base salary, bonus, and equity. OpenAI President Greg Brockman put it more directly: your ability to access computational resources will increasingly determine your overall productivity.

Huang Renxun himself said in his GTC speech: “How many tokens are following my job? It has become a recruitment tool in Silicon Valley.”

In the 1950s, auto workers in Detroit earned some of the highest wages in the United States. What truly enabled them to achieve middle-class living was Henry Ford’s invention of the assembly line. Workers stood in place while the line moved past them, and each worker’s output was amplified by dozens of times through robotic arms. A Detroit worker’s standard of living far exceeded that of contemporary craftsmen—even if their craftsmanship wasn’t necessarily better—because they stood on a much larger assembly line.

The 2026 token budget is like the 1950 assembly line.

But there is one difference.

Detroit workers leaving Ford can go to General Motors or Chrysler—assembly lines are everywhere. Unions can negotiate with management for better line speeds and safer working conditions.

Token allocations are different. On the day the company gives you tokens, you’re a superhero; on the day they take them back, you’re just an ordinary person. Stocks can be cashed out and taken with you; skills go with you when you change jobs. Token allocations are nothing more than a cheat code—the power to turn them on or off lies entirely in the company’s hands.

Silicon Valley has coined a new term to describe this situation: "GPU hunger."

Top AI researchers are switching jobs—salary gaps have now ranked second, with compute power topping the list. Without the ability to run experiments or deploy agents, their capabilities are capped by quotas. “How many tokens are you offering?” sometimes comes before stock options. Stock is a distant check that might depreciate; tokens are today’s immediately redeemable productivity.

Those who don’t use AI are out.

Goldman Sachs estimates that AI could automate 25% of work hours in the U.S. A Mercer survey reveals that 65% of executives expect 20% to 30% of employees to be reallocated due to AI. When these two figures are combined, the conclusion is clear: those with Token see explosive productivity gains, while those without Token are optimized out.

The boundary is becoming increasingly less related to token quotas and human ability.

Token throughput is the valuation.

An individual's value is determined by their token allocation. What about a company?

In early March 2026, a Shanghai-based company called MiniMax released its first annual report since going public. It reported annual revenue of $79 million and an adjusted net loss of $250 million. By traditional financial metrics, this is a cash-burning startup with revenue amounting to only a fraction of Accenture’s quarterly earnings.

But the capital market does not see it that way.

Yan Junjie, CEO of MiniMax, said something during the earnings call that matters more than the entire report: "The company's value is determined by intelligent density multiplied by token throughput."

Token throughput, not revenue growth rate, not user count, not gross margin.

The data supporting this statement is solid. In February 2026, MiniMax’s M2 series models saw their daily token consumption increase sixfold compared to December two months prior. Token consumption in programming scenarios rose tenfold. On the AI model aggregation platform OpenRouter, MiniMax’s M2.5 consumed 4.55 trillion tokens in two weeks, surpassing all U.S. models and making a Shanghai-based company the first to top the global token consumption leaderboard.

The South China Morning Post described this event by saying that China's open-source models ended the one-year market dominance of American developers. What brought about this end? Token consumption. The one whose tokens were burned the most is the winner.

The same logic applies to OpenAI. OpenAI’s API platform processes 60 billion tokens per minute, a 20-fold increase over two years. The number of enterprise customers spending over $100,000 annually nearly doubled sevenfold. After analyzing the data, Barclays analyst Ross Sandler concluded that OpenAI’s consumer token consumption is more than twice that of Google Gemini.

Token consumption has become the standard metric for ranking AI companies.

More interestingly, here’s how this looks inside the company. The New York Times recently reported on a phenomenon called “tokenmaxxing”: engineers at Meta and OpenAI compete on internal leaderboards to see who consumes the most tokens.

Token budgets are becoming a standard benefit, much like free lunches and dental insurance a decade ago. An engineer working at Ericsson’s Stockholm office told the New York Times that he may be spending more on Claude than his salary, but the company covers it.

Last week, a TechCrunch article did the math: an engineer writing an article in the afternoon might use 10,000 tokens, but an engineer running an agent cluster can burn through millions of tokens in a day without typing a single word.

Two years ago, the price per million tokens was $33. Now, it’s 9 cents—a 99.7% drop. The cheaper the price, the more aggressively it’s burned. The more it’s burned, the more essential it becomes.

Yan Junjie predicted on the conference call that demand for tokens in the future could increase by one to two orders of magnitude.

This is the new way to value a company in 2026: not by how much profit it makes, but by how many of its tokens have been burned. MiniMax is losing $250 million, but its token throughput growth curve is terrifyingly steep—and the capital market is willing to bet on it. You can think of it like YouTube in 2006: zero revenue, but bandwidth consumption growing exponentially, and Google was willing to pay $1.65 billion for it.

Back then, YouTube burned bandwidth. Today, MiniMax burns tokens. The unit of measurement has changed, but the logic remains the same.

Production capacity can wait, but debt cannot.

In the same week as GTC, another event occurred.

On March 18, Stripe released the Machine Payments Protocol. In simple terms: AI agents can now spend money on their own.

An agent requires a dataset and pays to download it itself. It purchases computing power by the second to run inference and pays for calling another agent’s API. The entire process requires no human confirmation. Visa has adapted credit card payments for this protocol, Coinbase has created agent-specific wallets, and Mastercard is developing Agent Pay.

The consumption of tokens now has an additional source. Previously, the only scenario was "humans dispatching agents." Now, agents themselves are consuming tokens, and using the money earned from tokens to buy more tokens. John Collison, co-founder of Stripe, used one word to describe it: flood.

Huang Renxun provided the corresponding figures on stage: NVIDIA aims to increase the token generation rate from 22 million to 700 million, a 350-fold improvement.

It's like building an entire highway network, betting that vehicle traffic will grow exponentially.

A $600 billion infrastructure bet requires one前提: global token consumption must be large enough to justify the return on investment. This premise is currently only an assumption—and a very expensive one.

In the final quarter of 2025, technology companies issued a record $108.7 billion in bonds. In the first weeks of 2026, another $100 billion followed. Morgan Stanley and JPMorgan estimate that total debt issued by AI-related companies over the coming years could reach $1.5 trillion. According to Goldman Sachs, AI capital expenditures now account for approximately 3% of U.S. GDP.

Some of the first people on Wall Street to sense the risk have already started buying insurance. Trading volumes in credit default swaps are rising. For just a few dozen basis points in premium, investors are betting that these companies may default on their debts. Daniel Sorid, Citi’s head of credit strategy, said at an investor meeting: “As a credit investor, the scale of this transformation and the massive capital required naturally raises unease.”

Google co-founder Larry Page once said something more extreme within the company; Page repeatedly told Google employees: "I'd rather go bankrupt than lose this race."

It precisely describes a prisoner's dilemma: each giant is betting that the other will continue investing, so none can afford to stop. The one who stops gets eliminated immediately.

On the positive side, there is hard data: the token generation rate has increased 350-fold. Stripe has just allowed agents to spend their own money. McKinsey has scaled from thousands of agents to 25,000 within two years. If the agent economy takes off fully, the growth curve of token consumption could indeed turn exponential.

But there’s one date that’s keeping many people up at night: the renewal cliff in the second half of 2026.

From 2024 to 2025, companies spent their “innovation budget.” CEOs needed to say at earnings calls, “We’re embracing AI,” with low price sensitivity and relaxed expectations—spending on posture. By the second half of 2026, the first pilot projects reached renewal time. The innovation budget was exhausted; the CTO vacated the seat across the table, and the CFO took it. The CFO recognizes only one number: ROI.

If a large number of pilot projects are canceled, the terminal consumption of the token will suddenly face a gap. The production capacity built up by 600 billion yuan upstream—data centers constructed, power connected, and chips installed—will become idle capacity.

This has happened before in history.

In 2000, telecom companies spent trillions of dollars laying undersea fiber-optic cables. When the bubble burst, 90% of the global cable capacity lay dark and unused for nearly a decade—until Netflix began streaming and the iPhone ignited mobile internet, gradually lighting up the cables one by one. The cables were not laid in vain. Lucent, Nortel, and WorldCom, the companies that built them, all went bankrupt. The infrastructure remains, but the builders are gone.

In 2012, China's photovoltaic industry: Wuxi Suntech and Jiangxi Solaic drove module prices below global cost lines. Severe overcapacity led to a three-year industry purge. Demand eventually arrived—today, solar power is the fastest-growing energy source on Earth. Suntech went bankrupt. Solaic went bankrupt. The pioneers lay down in the final darkness before dawn.

After Bell invented the telephone, Western Union refused to buy the patent for $100,000. A decade later, Western Union was willing to pay $25 million, but Bell refused to sell. Thirty years later, telephone networks spanned the entire United States. But most of the small companies that built those networks didn’t survive until the telephone became widespread. The winner was AT&T, which later acquired and monopolized everything.

The story of infrastructure is always this version. The direction is almost always right, but timing can be deadly.

Back to tokens. The structure discussed earlier—where tokens become labor and humans become interfaces—relies on tokens being continuously, massively, and acceleratingly consumed. Engineers’ tenfold productivity is sustained by token supply; cut it, and output drops to zero. OpenAI’s $840 billion valuation rests on compute commitments; terminate the agreement, and the value shrinks. $600 billion in infrastructure depends on growing end-user consumption; if growth slows, it runs idle.

Each layer depends on the layer below it. Consumption growth lags behind construction growth by two to three years, causing pricing across the entire chain to adjust.

Which railway are you relying on?

In 2023, having a card makes you the boss. In 2026, having a token makes you the boss.

It sounds like just a change of wording, but the underlying changes run deeper than most people realize.

GPU is an asset—once you buy it, it’s yours, locked in a data center and inaccessible to others.

Tokens are traffic. Your tenfold output, your high valuation, your leverage at the negotiation table—all rest on a continuous supply that isn’t yours. Turn off the faucet, and everything goes to zero.

When tokens become actual working labor, humans become interfaces connected to those tokens. Good interfaces allow tokens to generate greater value—judgment, taste, and experience still matter. But how much an interface can accomplish depends first on how many tokens it is connected to.

In the 1870s, American farmers discovered that growing good wheat wasn’t enough—you needed to be next to the railroad. In the 1950s, skilled artisans realized that no matter how good their craft, they couldn’t compete with workers on assembly lines. Today’s engineers are discovering that no matter how elegant their code, without a Token budget, it’s all just spinning wheels.

When tokens become real labor, humans become interfaces. The quality of the interface itself still matters, but its value depends first and foremost on who is powering it.

Click to learn about the open positions at BlockBeats

Welcome to the official BlockBeats community:

Telegram subscription group: https://t.me/theblockbeats

Telegram group: https://t.me/BlockBeats_App

Official Twitter account: https://twitter.com/BlockBeatsAsia

Token consumption drives AI workforce and company valuation in 2026

The remote control for the cheat is not in your hands.

Token throughput is the valuation.

Production capacity can wait, but debt cannot.

Which railway are you relying on?