NVIDIA Launches Vera CPU, Aiming to Address Bottlenecks in the AI Agent Era

On June 1, NVIDIA unveiled the Vera CPU at the GTC Taipei 2026 event held during Computex Taipei, alongside its next-generation AI supercomputing platform, Vera Rubin, with early customers including OpenAI and Anthropic.

This is NVIDIA’s first launch of a standalone CPU product line; NVIDIA’s growth over the past two decades has been almost entirely built on GPUs. At the launch event, NVIDIA CEO Jensen Huang stated that in the era of AI agents, the CPU has become a critical bottleneck for data center performance, and it cannot slow down the token production speed of AI factories.

In May, AMD CEO Lisa Su announced on the earnings call that the market size for server CPUs was being doubled from $60 billion to over $120 billion, with the compound annual growth rate for 2025 to 2030 increasing from 18% to 35%.

According to IDC, the global server market size is projected to reach $444.1 billion in 2025, a year-over-year growth of 80.4%, with AI servers accounting for the majority of this increase. UBS, in its recent semiconductor industry report, forecasts that the potential market size for server CPUs will grow from approximately $30 billion in 2025 to around $170 billion by 2030, nearly quintupling over five years.

According to data from market research firm Mercury Research, in the first quarter of 2026, AMD’s revenue share in the server CPU market reached 46.2%, while Intel’s was 53.8%. However, AMD’s shipment share was only 33.2%, with Intel still accounting for 66.8%. This means AMD generated higher revenue with fewer chips, clearly demonstrating the premium pricing power of its high-core-count products in this quarter.

Lin Meibing, Chief Analyst at ICTIME, told the Economic Observer that CPU is the most unexpected variable in this round of the AI cycle. As AI moves from conversation to agents, the demand for CPU in inference has surpassed that in training.

The GPU is waiting for the CPU

In November 2025, Intel and Georgia Tech jointly published a paper titled "A CPU-Centric Perspective on Agentic AI." In this paper, the research team conducted empirical measurements on five typical agent workloads, revealing that CPU-based tool handling accounted for 43.8% to 90.6% of total latency.

A securities analyst who has long followed the semiconductor sector noted that during the training phase of large models, CPU workload accounts for only about 10% to 30%, and in some workloads, it may reach nearly 40%. The vast majority of computations are handled by GPUs. This is because the computational process of training AI large models is highly regular, involving billions of parameters repeatedly performing matrix multiplications on massive datasets—tasks for which GPU parallel architectures are specifically designed. CPUs, in contrast, handle data loading, communication scheduling, and result copying, but do not participate in the core matrix operations.

However, during the inference phase, this ratio reverses: the CPU's share of workload increases to over 70%, and is even higher in Agent scenarios. This is because Agent tasks require multi-step reasoning, calling external tools, executing code, reading and writing databases, searching the web, and orchestrating intermediate results into a final output.

Programming assistants, data analysis tools, and automated research agents all fall into this category and represent the fastest-growing use cases for large models today. These tasks share common characteristics: they are control-flow intensive, involve complex branching, and require frequent input and output. GPUs experience significantly reduced utilization when handling these serial, fragmented tasks.

Several industry professionals have noted that, in Agent tasks, overall GPU utilization is typically below 50%, significantly lower than the 70% to 85% seen in traditional inference services. Under the Agent approach, AI token consumption is often 20 to 30 times higher than in regular conversations, as a single user interaction frequently involves dozens of tool calls and intermediate reasoning steps.

According to IDC, the global number of tasks executed by agents is expected to grow from approximately 44 billion in 2025 to over 400 trillion in 2030.

Intel management stated on the Q1 2026 earnings call that the number of CPU cores required per gigawatt of power consumption in the AI agent era could rise from approximately 30 million today to 120 million. Market research firm Gartner also forecasts that by 2027, 40% of agent projects will be scaled back or canceled due to infrastructure cost overruns, with a significant portion of these overruns stemming from ongoing tool invocation and context management overhead on the CPU side.

Agents generate large amounts of intermediate data when handling long conversations and complex tasks. AI systems must retain all prior dialogue content and tool call results during reasoning—a process known in the industry as KV Cache (key-value cache). This cache continuously expands with each conversation turn, but GPU memory capacity is severely limited: NVIDIA’s H100 offers only 80GB, and the upcoming B200 provides just 192GB. The intermediate data from a single complex agent task can easily exceed these limits.

Currently, the industry commonly addresses this by transferring these intermediate data from the GPU to the CPU side. The CPU can be paired with DDR5 memory, with individual capacities reaching several terabytes—orders of magnitude larger than GPU memory.

The CXL Industry Consortium, comprising chip manufacturers such as Intel, AMD, and ARM, released the CXL 4.0 specification (Compute Express Link, an open standard for high-speed interconnects between chips) in November 2025, enabling multiple CPUs to share a single large-capacity memory pool and reducing the overhead of moving data between chips.

As a result, the CPU is no longer solely responsible for task scheduling but also for data storage and memory management during AI inference.

In addition, CPUs themselves have undergone intensive technological upgrades over the past few years. The number of cores in server CPUs has increased from 28 cores in 2017 to 288 cores (Intel Clearwater Forest) and 256 cores (AMD Venice) by 2026, nearly a tenfold increase in density.

In 2023, Intel introduced the AMX (Advanced Matrix Extensions) instruction set, giving CPUs a dedicated matrix computation unit for the first time. According to Intel's test data, fourth-generation Xeon processors with AMX achieve up to a 10x improvement in AI performance over the previous generation in deep learning inference scenarios. The memory subsystem has also been upgraded from DDR4 to DDR5, doubling both bandwidth and capacity per platform.

Upgrades in core count and instruction sets also correspond to changes in the CPU-to-GPU ratio. During the Q1 2026 earnings call, Intel CEO Pat Gelsinger stated that in training scenarios, the typical ratio is 7 to 8 GPUs per 1 CPU, while in inference scenarios, it converges to 3 to 4 GPUs per 1 CPU. In agent scenarios, the ratio is expected to further converge to 1:1.

Intel CFO David Zinsner added during the same call that the overall industry ratio of CPUs to GPUs has narrowed from 1:8 to approximately 1:4.

The first major price increase in over a decade

The above ratio changes have been reflected in product pricing.

Jia Bin, market head of a CPU distributor in Shenzhen, told reporters that starting in February 2026, Intel and AMD gradually increased prices across their entire server CPU lines, with overall hikes ranging between 10% and 15%. Premium AI server CPUs are currently trading at even higher spot premiums, and another price increase may occur in the second half of the year.

Jia Bin said that over the past decade, server CPUs have generally offered increased performance without price increases, with performance improving alongside process advancements while unit prices remained stable; this year’s price hike is rare in the industry. Intel’s main production lines have seen utilization rates rise from below 80% to 100%, with multiple models currently out of stock and delivery lead times extending to three to four months.

AMD is also facing tight capacity. Jia Bin said that 2026 is the first time in his career he has seen Intel and AMD server CPU capacities fully booked—“In the past, CPU supply was always sufficient, but this year it’s reversed.”

Jia Bin also noted that customer demand for CPUs when purchasing AI servers is splitting into two categories. One category consists of CPUs deployed inside racks alongside GPUs, prioritizing maximum core counts—over 128 cores—with an average price above $4,000, compared to traditional server CPUs, which average just over $2,000. The other category comprises CPUs deployed independently outside racks, used for agent tool execution, sandboxing, and task orchestration; these do not require peak performance, with around 64 cores being sufficient, but in much larger quantities.

Jia Bin said that, in an ideal scenario, each Agent task dedicates a single CPU, and independent deployment is more efficient than virtualized partitioning. The average price of an off-shelf CPU is around $3,000, and “as the number of cores increases, the price per unit rises disproportionately—not linearly. Therefore, it is currently common practice among customers to use mid-range products off-shelf for volume deployment and flagship products inside the rack to ensure performance.”

In a semiconductor industry report titled "Rise of the Agents," released on June 11, Bank of America Securities raised its forecast for the total addressable market (TAM) for server CPUs in 2030 to over $170 billion, and for the first time segmented this market into three parts: approximately $30 billion for traditional cloud computing CPUs, approximately $70 billion for AI cluster head node CPUs, and approximately $70 billion for AI agent standalone node CPUs. The third segment, which was nearly zero in 2025, represents a completely new market emerging starting in 2026.

Morgan Stanley also predicted in a research report on June 4 that agent AI will generate an additional demand of $32.5 billion to $60 billion in the server CPU market by 2030. Zhongtai Securities, in a deep-dive CPU report released on June 7, defined 2026 as the "year of origin" for CPU benefiting from AI scaling.

The aforementioned Bank of America Securities report also presents a historical comparison of shipment volumes: in 2022, AI CPU shipments accounted for 19% of AI accelerator (e.g., GPU) shipments; by 2025, this proportion is expected to rise to 51%, and is projected to reach 127% by 2030. According to this forecast, the number of CPUs in AI servers will surpass the number of GPUs within five years.

New demands for domestic CPUs

During the Taipei Computer Show, NVIDIA announced that its newly released Vera CPU is based on the ARM architecture—a CPU instruction set known for low power consumption and high energy efficiency, alongside x86 as one of the two dominant architectures—and supports deployment of up to 256 chips per rack, using liquid cooling.

In the Agent sandbox scenario, Vera's performance is 1.8 times that of x86 processors. In NVIDIA's newly released Vera Rubin supercomputing cluster (NVIDIA's next-generation AI data center platform), a 40-rack POD—a minimal complete computing unit composed of multiple racks—contains 1,152 Rubin GPUs and up to 1,088 Vera CPUs, with a ratio close to 1:1.

NVIDIA also noted that nearly 2.5 million Grace CPUs have been shipped to date, with CPU-related revenue expected to approach $20 billion by 2026.

Jia Bin believes that the aforementioned $20 billion figure uses a broad statistical scope, encompassing revenue from CPUs across various product forms, which is not entirely equivalent to revenue from the standalone sale of CPU chips in the traditional sense. However, even accounting for this difference in scope, this scale is already substantial for a company that did not have an independent CPU business in 2024.

Lin Meibing believes that NVIDIA’s entry into the CPU market carries more symbolic significance than product impact. Previously, AI servers were centered around GPUs, with CPUs serving only as supplementary components. Now, with the world’s largest GPU company directly entering the CPU market and securing OpenAI and Anthropic as its first customers, the market position of CPUs has changed dramatically from just two years ago.

According to AMD's Q1 2026 earnings report, the company's data center business revenue reached $5.775 billion, surpassing Intel's $5.1 billion during the same period for the first time. Additionally, Lisa Su stated during the earnings call that AMD has set a five-year goal of reaching $100 billion in annual data center revenue.

Intel CEO Patrick Gelsinger has also expressed firm confidence in the core role of CPUs in the AI era at multiple public events.

This also presents an opportunity for China’s CPU industry chain companies. Jia Bin noted that leading domestic cloud providers are increasing their procurement of server CPUs this year—partly to complement GPU purchases for new AI data centers, and partly because the CPU-to-GPU ratio has narrowed from the previous 1:8 to 1:4 or even higher, resulting in more than double the number of CPUs required in the same data center compared to last year.

In fact, within the country, a relatively complete industrial chain centered on server CPUs has been established.

Hygon Information Technology (688041.SH) is one of the largest suppliers of x86 architecture server CPUs in China. According to relevant financial reports, Hygon's revenue in 2025 reached RMB 14.377 billion, a year-over-year growth of 56.92%; in the first quarter of 2026, revenue was RMB 4.034 billion, with the year-over-year growth rate further increasing to 68.06%.

According to public information, Huawei Kunpeng follows a fully self-developed ARM stack approach, with the Kunpeng 920/950 and Ascend AI chips deeply integrated, primarily serving Huawei’s own ecosystem and the information technology innovation market.

In terms of supporting chips,澜起科技 (688008.SH) primarily produces memory interface chips—signal relay chips between server CPUs and memory modules. According to public information, its memory interface chips held a 36.8% global market share in 2024, ranking first worldwide. Its other product line, PCIe Retimer chips—used for signal amplification and restoration in high-speed data transmission—achieved a 10.9% global market share in 2024, ranking second.

During the packaging and testing phase, according to publicly available information, Tongfu Microelectronics (002156.SZ) is one of AMD’s most important packaging and testing partners worldwide.

Li Bin told reporters that the software ecosystem for domestic chips is approaching a tipping point. He cited an example: on the day DeepSeek V4 was released, multiple domestic chip manufacturers completed adaptation within the same day, whereas adapting to DeepSeek R1 previously required one to two months. The significant acceleration in adaptation speed indicates that the software toolchain and driver layers for domestic chips are rapidly maturing, which is beneficial for the entire domestic CPU and accelerator industry chain.

In Lin Meibing’s view, the benefits for domestic CPUs stem from two layers: one is industry growth driven by rising global demand for server CPUs, and the other is domestic substitution fueled by information technology innovation policies.

According to relevant documents issued by the SASAC in 2022, central and state-owned enterprises must complete the domestication of their information systems by the end of 2027. During interviews, journalists learned that the domestication rate of high-end server CPUs in China remains low, indicating significant potential for substitution. With less than two years remaining until the policy deadline, the delivery window for domestic innovation CPUs is narrowing, presenting a concentrated test of product maturity and shipping capabilities for domestic CPU manufacturers such as Hygon Information and Loongson Technology (688047.SH).

Lin Meibing believes that this current cycle of CPU price increases differs from previous ones, as the additional demand stems from new requirements driven by AI agents, rather than upgrades driven by process advancements.

Zhiwei Ying’s assessment is similar. He said that over the past few years, market attention has been almost entirely focused on GPUs, but as AI applications truly enter large-scale deployment, the CPU’s roles in scheduling and management will only become more critical. In his view, this isn’t about CPUs replacing GPUs—GPUs remain essential—but the real differentiator going forward will be the synergy between CPUs and GPUs, not the individual performance specs of either chip.

This article is from the WeChat public account: Economic Observer, author: Zheng Chenye