Author: Ada, Schain TechFlow
San Francisco, San Jose Convention Center, GTC live.
NVIDIA’s chief scientist, Bill Dally, sat on stage opposite Google’s Jeff Dean. Midway through their conversation, Dally dropped a number: “Previously, migrating a standard cell library containing approximately 2,500 to 3,000 cells required a team of eight engineers working for about 10 months.”
He paused for a moment.
Now, just run it on a single GPU overnight.
There was no gasp from the audience, because everyone who understood the statement knew what it meant: eight engineers’ ten months of work had been erased overnight by a GPU developed in-house. Dally further added that the results achieved matched, and in some cases surpassed, human-designed benchmarks in terms of area, power consumption, and latency.
The next day, the news was interpreted as "NVIDIA using AI to design GPUs."
But the truth behind this matter is far more intriguing than the news headline suggests.
What is NVIDIA running internally?
NVIDIA isn't running black boxes—it's using several toolchains refined over years.
NB-Cell is a reinforcement learning-based program designed to tackle the most challenging task of standard cell library migration. Prefix RL aims to solve the long-standing research problem of placing the carry-lookahead stage in carry-lookahead chains. Dally stated that the layouts generated by this system are “things humans would never think of,” improving key metrics by approximately 20% to 30% compared to human-designed layouts.
Additionally, there are two internal LLMs: Chip Nemo and Bug Nemo. NVIDIA fed these large models with the RTL code, architecture documentation, and design specifications of every GPU in its history. According to Dally, this is equivalent to distilling twenty years of NVIDIA’s muscle memory—from G80 to Blackwell—into an internal model, allowing new hires to immediately interface with the expertise of a seasoned engineer with two decades of experience.
So, can AI design GPUs?
On the contrary, Dally's exact words were: "I very much hope that one day I can simply say, 'Design me a new GPU,' but we are still far from that point."
NVIDIA didn't use AI to design its GPUs. But what it did instead has made the entire industry dependent on it.
$2 billion purchase of EDA hinterland
On December 1, 2025, NVIDIA invested $2 billion in Synopsys, one of the three leading EDA companies. Both parties signed a joint development agreement to integrate NVIDIA’s accelerated computing stack into Synopsys’s entire EDA workflow, with Blackwell and the next-generation Rubin GPUs deeply integrated with Synopsys.ai.
Synopsys’s position needs to be explained: nearly every advanced-process chip in the world—such as Apple’s M series, AMD’s MI series, and Google’s TPU—is designed using Synopsys or Cadence’s toolchains. Together with Siemens EDA, these three companies dominate the foundational tools for chip design. You can choose not to use Qualcomm’s chips or TSMC’s fabrication lines, but you cannot escape their software.
Three months after investing in Synopsys, NVIDIA brought in Cadence, Siemens, and Dassault, announcing that all of them are developing AI-driven chip design tools based on NVIDIA GPUs.
NVIDIA's benchmark results are striking: Synopsys PrimeSim is 30 times faster on Blackwell, Proteus is 20 times faster, and Sentaurus achieves a 12x acceleration on B200 compared to CPUs. MediaTek sped up Cadence Spectre by 6x using H100. Astera Labs accelerated chip verification by 3.5x using Synopsys with NVIDIA.
One detail worth highlighting separately: Cadence’s Millennium M2000 platform is marketed as “built exclusively for the EDA market, powered solely by NVIDIA Blackwell.”
The words “exclusive” are most worth pondering. Previously, EDA tools ran on CPUs, and both Intel and AMD could participate. Going forward, to use the fastest EDA tools, you can only buy NVIDIA GPUs.
The actual shape of the flywheel
NVIDIA’s flywheel, as most people understand it, works like this: sell GPUs to AI companies, AI companies train large models, the models prove the indispensability of GPUs, and more people buy GPUs.
This flywheel is already terrifying enough. But beneath it lies another layer.
NVIDIA uses its own tools to design the next generation of GPUs, creating a generational leap in design efficiency while tying the entire industry's EDA toolchain to its own hardware. Competitors want to catch up, but even the tools they need to do so must be rented from NVIDIA’s ecosystem.
The underlying anxiety behind AMD’s earnings report, which caused its stock to plummet, is this: even though NVIDIA and Synopsys publicly state that their investment carries no obligation to purchase NVIDIA hardware, the market knows full well that the initial releases of accelerated EDA features are exclusively on NVIDIA hardware—leaving AMD and Intel reliant solely on a path optimized for their biggest competitor’s platform.
Imagine an AMD engineer someday wants to design a chip to compete with Blackwell. He opens Synopsys’s tool, which runs fastest on NVIDIA GPUs. He’s then forced to either endure a design cycle twice as slow or buy a large number of NVIDIA cards to design a chip meant to beat NVIDIA.
Shovels are still being sold, but the way they're sold has changed.
The Real Situation of Domestic GPUs
At this point, we must present a set of sobering numbers.
In the same year that NVIDIA's net profit for fiscal year 2025 surpassed $70 billion, China's domestic GPU "four unicorns"—Moore Threads, Musen, Biren, and Suiren—are lining up at the IPO window.
According to Moore Threads' prospectus, the company incurred a cumulative net loss of RMB 5 billion from 2022 to 2024, and an additional loss of RMB 271 million in the first half of 2025, resulting in a cumulative accumulated deficit of RMB 1.478 billion as of June 30. The company’s management itself estimates that it will not achieve consolidated profitability until at least 2027. MixC is in slightly better shape, with a cumulative loss exceeding RMB 3 billion over three years. The most severe case is BiRen, which suffered a loss of over RMB 6.3 billion in three and a half years, with revenue of only RMB 58.9 million in the first half of 2025—less than a fraction of Moore Threads’ RMB 702 million during the same period.
Now consider the intensity of R&D investment. Moore Threads' R&D expenses as a percentage of revenue were 2,422.51% in 2022 and remained as high as 309.88% in 2024. The amount spent on R&D in a single year is more than three times its revenue. This is not business operations—it’s life support, sustained by continuous funding from the primary market and the recent opening of the STAR Market.
The tools layer is more of a bottleneck. According to Huada Jiutian’s 2022 IPO prospectus, its tools only partially support the 5nm advanced process node. Genrad Electronics can cover the 7nm/5nm/3nm nodes, but it only provides point tools and is far from offering a full flow.
Liu Weiping, founder of Huada Jiutian, spoke candidly: "Domestic EDA tools still have significant shortcomings in supporting advanced processes, particularly today’s 7nm, 5nm, and 3nm nodes. Currently, domestic EDA can achieve 14nm capabilities; although 7nm process technology has been mastered, deeper integration with real-world applications requires coordinated efforts across the entire industrial chain."
In other words, domestic EDA tools for full-flow advanced process nodes are essentially unusable. Chinese GPU companies still rely on Synopsys and Cadence for chip design. In 2025, Trump briefly announced export controls on all critical software; although this was never fully implemented, EDA tools for advanced nodes below 7nm remain strictly controlled. When the license will be cut off is entirely out of our hands.
The capital market's reaction was surreal. On its listing day, Muxi's stock closed at ¥829.9, surging 692.95% in a single day. After its listing, Moore Threads' stock price briefly rose to become the third-highest in the A-share market, trailing only Kweichow Moutai and Cambricon. Some media outlets estimated its total market capitalization at approximately ¥359.5 billion based on the prevailing stock price.
The real business behind the numbers is that a group of companies still burning cash, still reliant on regulated overseas toolchains to design chips, are being valued on the secondary market as successors to NVIDIA.
The tools these companies use to design chips are becoming part of NVIDIA’s ecosystem. NVIDIA’s $2 billion commitment to Synopsys and Cadence’s labeling of the Millennium M2000 as “exclusively based on NVIDIA Blackwell” have turned the very act of catching up into a paradox.
A complete chain from design to manufacturing
Back to that conversation at GTC.
Dally remained humble throughout the event. “AI is still far from designing chips on its own”—NVIDIA has been saying this for four or five years. But each year, the phrasing changes. Four years ago, it was “AI can assist in design”; three years ago, “AI can automate certain steps”; this year, “it can accomplish in one night what eight people would take ten months to do.” Each year, they take one step forward—and each year, they leave a remark: “We’re still far from the ultimate goal.” Looking back three years later, what was once considered “far off” has already been achieved, and the new “far off” has been redefined at a level no competitor can yet reach.
Over the past twelve months, NVIDIA has done just one thing: applied AI to the most valuable and deeply protected segments of the chip supply chain, then sold these tools layer by layer across the entire industry.
The front end of chip design has been taken over by internal LLMs like Chip Nemo; the mid-stage tasks of standard cell library migration and layout optimization are handled by NB-Cell and Prefix RL; the entire EDA toolchain is tied to proprietary GPUs through Synopsys’s $2 billion investment and Cadence’s “exclusive Blackwell-based” integration; and the lithography computation at the manufacturing stage is managed by cuLitho, which TSMC is already using.
From design to manufacturing, NVIDIA has reimplemented every step with AI. Each step leads to the same conclusion: if you want the fastest tools, you need to buy NVIDIA’s GPUs.
For every competitor aiming to design a chip capable of outperforming Blackwell, the most embarrassing reality has already occurred: the fastest version of the EDA tools needed to design the chip runs on NVIDIA GPUs; the fastest algorithm library for photolithography calculations is provided by NVIDIA; and the computing power required to train the AI used in the design process still comes from NVIDIA’s GPUs.
The person you need to defeat is renting you all the tools you need to defeat them. The rent is paid annually, and the contract increases in price each year.
