Editor’s Note: As AI begins writing code, handling customer support tickets, and reviewing legal documents, a more fundamental question is emerging: What are companies truly purchasing—tokens, GPU hours, or completed work?

This article proposes a framework worth noting: the commercialization of AI should not be understood merely as a "computing power market" or a "model invocation market," but is evolving into a new "machine labor market." In this market, tokens are merely units of measurement, GPUs are inputs, models are production tools, and what is truly priced and traded is the economic labor performed directly by software.

The core judgment of the article is that AI pricing mechanisms will evolve from raw tokens and standardized model capabilities, to industry-specific labor, and finally to programmable result markets. In other words, future enterprises may no longer care which model or which GPU performs a given task, but rather whether it delivered results meeting established standards within specified latency, accuracy, reliability, and cost parameters.

This also means that the impact of AI on the human labor market is not merely a simple replacement. As machines take on more standardized and verifiable tasks, human roles may shift toward reviewing, assuming responsibility, managing context, and making final judgments. In certain scenarios, the final 1% of human judgment may become even more valuable, as it enables the scaling of 99% automation.

From this perspective, the next stage of competition in the AI market may no longer be just about model capabilities or pure computational power price wars, but rather who can first standardize, verify, and price “work,” ultimately making machine labor a new type of production factor that can be procured, settled, and traded.

The following is the original text:

Past waves of productivity have always come from tools and software created for humans to optimize how work is done. Spreadsheets aided accountants and analysts, conveyor belts increased throughput, and hammers amplified human leverage. But the true labor has always come from humans.

Now, AI is producing work end-to-end, directly performing labor itself. It can write code, handle customer service tickets, and review legal documents. The bottom of the tech stack is being compressed: the old stack supported labor, while the new stack begins to produce labor.

If you’ve recently heard discussions about the financialization of AI, you’ve likely heard Jensen and others say that LLM tokens and/or GPU hours are becoming new commodities. This intuition makes sense—tokens are measurable, billable, and easy to chart; billions of dollars are flowing into GPU hours. But tokens are merely meters, and GPU hours are merely inputs—no one buys them for their own sake. What people truly want is to get work done. AI is turning the technology stack itself into a source of labor.

Machine labor: Work performed by software, with economic utility, and sold into the production process.

The market is already moving in this direction. Benchmark’s Sarah Tavel prefers to understand this opportunity through the lens of outsourced labor markets rather than software categories. If a repetitive task is typically handled by dedicated offshore teams or professional services firms, it’s often also a task well-suited for AI delivery. Alex Rampell at a16z calls this “software eating labor”: the next act of software is doing the work itself. Julien Bek at Sequoia describes the same shift from another angle: services are becoming software—copilots sell tools, while autopilots sell work.

The missing market behind the pricing outcome

Seat pricing charges based on access rights, token pricing charges based on usage, and result pricing charges upon completion of the work. Result pricing takes us one step forward, but it still doesn’t answer one question: who decides the price?

If machine labor could be purchased directly, prices should arise from competition among suppliers. These suppliers must be able to meet the same standards for completing similar tasks or work, which requires standardization within and across industries and tasks.

The current approach uses LLM tokens, but the raw token is merely the底层 unit. A barrel of oil is just a unit of measurement; what is actually traded are barrels of specific grades of oil, with defined quality, delivery terms, and market prices. A barrel of Brent crude is not the same commodity as a barrel of high-sulfur heavy crude. The same applies to LLM tokens: tokens are merely units of measurement; what truly matters is the intelligence behind them—model quality, benchmark baselines, latency, context window, reliability, and delivery guarantees. One million tokens from a cutting-edge code model are not the same commodity as one million tokens from a cheap general-purpose model. The market needs standardized inference grades, just as the energy market needs standardized oil grades.

Anjali Shriva directly pointed out that a token is not a fixed cost unit; its economics vary with context length, task structure, input/output ratio, number of retries, tool calls, and Agent workflows. A token in a short prompt is not the same economic entity as a token buried within a long Agent loop.

We have long done this in the human labor market. No one hires a radiologist as a generalized “human hour.” People consider training background, licensing and certification, specialization, years of experience, availability, reputation, and accountability. Different human contract specifications correspond to different minimum standards and tiered expectations.

Human labor markets have always operated based on these criteria, though these criteria are often mixed, qualitative, and filled with various proxy indicators. Machine labor will make these criteria more explicit and more quantifiable.

For LLMs or agents, metrics such as skills, experience, speed, and reliability can all be directly written into contracts: benchmark scores, latency, throughput, context window, maximum output length, tool usage accuracy, uptime, and error rate. We can procure labor based on quantifiable expectations and outcomes.

TheGrid.ai's contract specifications serve essentially as a qualification filter, combined with price competition for LLM outputs. Suppliers that meet the specifications are eligible to compete:

Smart benchmarking ≥ minimum

Delay ≤ Limit

Throughput ≥ Minimum

Uptime ≥ Minimum

Error rate ≤ limit

Once all suppliers meet the same minimum threshold, they begin to compete on price. Buyers ask: Which supplier can deliver the required labor at the best price?

The hiring of radiologists has, in the context of LLMs, become a measurable problem: which LLMs can read X-rays with high proficiency and complete tasks within defined latency, context window, and other outcome-based contractual specifications.

The result is how buyers measure success; labor is the economic activity being supplied; and tokens are the fuel consumed by the machine as it completes the work.

The Grid is the machine labor market.

From tokens to the machine labor market

Markets can price inputs to the tech stack, but pricing outputs requires a machine labor market. Buyers don’t care about GPU hours. Model endpoints themselves are also unstable: they get renamed, deprecated, bundled, or outright retired.

Users and liquidity dislike frequent changes. GPUs and models will continue to evolve, but the stable unit is the work itself.

I believe the market will evolve along the following path: as you move up each level, what is purchased becomes more abstract and more valuable, but also harder to verify. The Grid should gradually ascend this ladder:

Original token → Commodified LLM capability market → Commodified labor market → Programmable outcomes market

Phase 1: Original token

Claude 4.7, GPT 5.5, Kimi 2.6, DeepSeek V4, GLM 5, and more.

Today, buyers purchase raw model outputs from inference providers. They send their own prompts, receive inference results, and pay based on usage. This is easy to verify, but it remains merely raw material. What buyers truly want is not tokens, but useful intelligence at the best possible price.

Phase 2: Commoditization of LLM Capability Market

For example, text/usd, code/usd, agent/usd, etc.

Buyers no longer select a specific model, but instead choose the intelligent category they need. Buyers still retain control over workflows, prompts, data, and application logic. The Grid simply routes each request to the qualified model that best matches the contract specifications at the lowest price.

Note: This is the first true abstraction layer above the original token, and the current position of TheGrid.ai.

Stage Three: Commoditized Labor Market

For example, accounting/usd, support_agent/usd, legal/usd, healthcare/usd, radiology/usd, etc.

As models become more specialized, the capability market can further evolve into industry-specific markets, similar to how humans specialize in different labor markets.

At this layer, we sell reasoning capabilities tailored to workflow verticals in specific labor sectors. As industry-specific models become more common, this market will expand rapidly. Relevant examples include Cursor’s Composer, Harvey for legal work, and EvidenceOpen for healthcare.

Stage 4: Agent-Programmable RFQ and Result Markets

For example, support_ticket_resolved/usd, pr_merged/usd, claim_processed/usd, etc.

The final layer is where The Grid moves from the reasoning market to the machine labor market.

This layer requires mechanisms such as RFQ (Request for Quote), custodial accounts, delayed settlement, buyer confirmation, supplier reputation, clawback mechanisms, and dispute resolution. It is likely to start with RFQ rather than directly adopting an order book. The buyer defines the scope of work, constraints, acceptance criteria, and settlement terms, and agents bid to complete the tasks. The Grid assists with routing, pricing, verifying, and settling these tasks.

This is the most valuable layer, but also the hardest to verify, as results may be delayed, subjective, and easily manipulated. A support ticket might be reopened; a PR might pass testing but still result in poor architecture.

Total cost = Cost of completing the work + Cost of assuming risk

A workflow doesn't automatically become a market just because intelligence has become commoditized or cheaper. Some tasks are highly dependent on private context, such as customer history or internal policies. The more a task relies on context, the less likely it is to be cleanly cleared in an open market. [@hypersoren https://hypersoren.xyz/posts/cybernetic-arbitrage/]

The market needs to reveal which labor categories will expand and which will contract.

Machine labor vs human labor

Anjali Shriva, in her draft on mechanism design, notes that AI narratives are often framed as substitution. In reality, it is more like a coordination problem: how work, attribution, incentives, and value are reorganized when both humans and machines participate in production.

Today, many internal AI use cases remain stuck because employees privately use AI, workflows remain siloed to individuals, and businesses cannot price these productivity gains or scale these benefits.

Most automatable tasks will likely be transferred to machines. Some roles will shift to human oversight, accountability, training, and context management. In certain cases, the final 1% of human judgment will become more valuable, as it can unlock the 99% of automated work at scale.

Rachel Su Park’s “Brave New World of AI Markets” points out that AI’s TAM should not be simply modeled as a replacement for current human labor expenditures, as it simultaneously alters both price and quantity. As the cost of work decreases, unit prices may fall, but consumption volumes may expand, as existing work is consumed more frequently and entirely new types of work, previously uneconomical, become viable. The article summarizes this as:

P × Q: Market size = Price per unit of work × Quantity of work consumed

If AI makes customer service interactions cheaper, companies can offer 24/7 availability. This market won't just be a cheaper version of the old customer service labor market—it could become a much larger customer engagement market.

AI is an expansive market because demand does not remain constant when the cost of work decreases.

Labor layer

The machine labor market should begin with jobs that have clearly defined specifications. GPU hours contain too much input information and only tell you what supported the work; while pricing the full outcome is too complex and overly context-dependent. As verification, reputation, and risk/insurance pricing are gradually taken over by machines, the market will continue to evolve toward a pure outcomes layer.

Machine labor can become tradable because buyers will increasingly care less about which model or which GPU produced the work, and more about whether the work itself meets the minimum standards and grade specified in the contract at the correct price. Agents will be even less concerned about these underlying sources.

Machines can now directly perform work with economic utility—work that can be defined, measured, priced, procured, and ultimately traded. Electricity, computing power, models, and tokens remain important, but they are all still upstream.

The real work is completed downstream, and the market is moving toward a simpler object: machine labor.