Chinese AI PC N90 Pro Enters the Agent Era with 35B Model Support

AI-generated summary: Edge computing power has become the entry ticket to the Agent era. At the GTC Taipei conference, Jensen Huang demonstrated a PC capable of running a personal Agent 24/7, marking AI’s transition from the large model era to the Agentic AI era. PCs are evolving from passive tools into intelligent personal computing hubs that can understand context and perform reasoning and planning. While Microsoft and NVIDIA are reinventing the PC, China’s Great Wall N90 Pro AI PC has also been officially launched, featuring the Houmo Intelligence M50 compute-in-memory chip with just 10W power consumption, enabling smooth local execution of 35B-parameter models. In terms of token costs, local execution achieves near-zero marginal cost, with industry projections indicating that 80% of inference workloads will shift to the edge. Domestic AI PC shipments are expected to surge by 146.5% year-over-year in 2026, as China’s technology stack is now fully capable of delivering AI Agent-powered computers.

Article author and source: 36Kr

At the recent GTC Taipei conference, Jensen Huang said that for the past 40 years, the way PCs have been used is to open applications, click, and type. Now, Microsoft and NVIDIA are reinventing the PC.

He demonstrated a computer capable of running a personal agent 24/7, allowing the public to fully sense that AI is transitioning from the large language model era into the Agentic AI era.

The role of the PC is also changing: from a passive tool waiting for user input to a personal computing hub capable of understanding context, reasoning, planning, and invoking tools. This transformation has been called by Jensen Huang the most significant underlying overhaul of the PC since Windows 95.

Around the same time, the domestic AI PC, Great Wall N90 Pro, was officially launched. This AI PC shares a similar positioning with the Agent Computer demonstrated by Jensen Huang, as it is also designed around the concept of an Agent and enables smooth local execution of large models within a slim and lightweight chassis.

Two technical pathways are being pursued simultaneously, both leading to the same conclusion: edge-side computing power is the ticket to entering the Agent era.

In terms of specific solutions, how do domestic approaches differ in terms of computing power supply, economic accounting, and security boundaries?

01. Re-inventing the PC: What does agent-native require?

Jensen Huang breaks down the Agent Computer into three necessary conditions.

First, sufficient local computing power is required, as the Agent must simultaneously handle multiple model calls and inferences, with parameter scales reaching tens of billions. Second, a secure sandbox is needed to ensure the Agent runs in a protected environment and cannot arbitrarily access the entire machine’s resources. Third, an Agent runtime is required—a middleware layer capable of understanding user intent, breaking down tasks, and invoking tools.

These three conditions are necessary because the way an Agent operates is fundamentally different from traditional software. Traditional software follows a linear execution path: a user clicks a button, the software performs one function, and then it ends.

The agent operates in a cyclical manner: it receives a vague instruction, breaks it down into multiple steps, invokes different tools, adjusts its next actions based on intermediate results, and continues until the task is completed. Throughout this process, each reasoning step requires computational power, each tool invocation demands permission control, and every step transition must be dynamically scheduled during execution.

Among the three conditions, the industry is first focused on breaking through computational power.

In 2024, when Microsoft introduced the Copilot+ PC standard, it required only 40 TOPS, and at the time, the industry generally believed this was sufficient. But two years later, this assessment has been overturned. From OpenClaw’s desktop automation to intelligent meeting assistants, large AI models have evolved from chat tools into practical productivity tools. A single task now requires multiple inference steps, and smaller models simply aren’t adequate. The industry now generally considers models with 35B parameters or more as the baseline.

The growth in computing power demand far outpaces the pace of chip iterations: while chip updates typically take about two years, current AI applications and multimodal large models undergo major changes every few months.

The impact of this disparity in pace has already been reflected across the industry chain. Leading companies in the field believe that currently, about 70%-80% of AI computing power is used for training, while 20%-30% is used for inference; however, this ratio is expected to reverse in the future. Data from TrendForce also shows that AI training computing power among the top five cloud providers in North America is projected to increase by 56% by 2026, while inference computing power is expected to surge by 122%.

As computing power increases, power consumption has become a new issue.

With traditional solutions, when computational power increases from tens of TOPS to over a hundred TOPS, the chip's power consumption and size grow linearly, making it impossible to fit into slim laptops.

Great Wall N90 Pro AI PC

The Great Wall N90 Pro's answer is: Start from your needs—figure out what you require from a laptop, then choose the chip accordingly.

Many AI chips were originally designed for data centers, consuming hundreds of watts and taking up large volumes; when moved to end devices, they create challenges with heat dissipation, battery life, and noise. The M50 chip used in the Great Wall N90 Pro, however, was not derived from a server-based design.

The M50 chip is developed by Houmo Intelligence. The core technical foundation of this solution is "in-memory computing." In traditional chips, computation and memory are separate, requiring data to be constantly transferred back and forth between them—a process that consumes significant energy. In-memory computing deeply integrates computation and memory, eliminating the need for lengthy data transfers and dramatically reducing power consumption.

Under the condition of locally running the 35B model, the M50's chip power consumption is controlled at around 10W, with the entire board under 15W. This means it can be directly plugged into an M.2 interface, just like installing a standard solid-state drive.

It is evident that in the Agent Computer era, domestic solutions to the issue of edge-side computing power demonstrate a clear "demand-driven" approach. Rather than forcing server-grade chips into laptops from a technical standpoint, designers have started from real-world user scenarios to create a chip specifically tailored for laptops. Power consumption, thermal design, and battery life balance—these engineering challenges have been considered from the very beginning of the design process.

Great Wall chose to collaborate with Houmo Intelligence and achieve deep synergistic optimization because they value Houmo’s ability to commercialize the concept of memory-computing integration.

A chip with just 10W power consumption enables a lightweight laptop weighing slightly over 1 kg to locally run a 35B-parameter large model smoothly—something that previously required a GPU exceeding 500W and a full-sized tower workstation, but now only needs an ordinary laptop.

Once computing power and power consumption are "sufficient," the next essential focus shifts to security. The nature of an Agent’s work requires it to rely on data, and local computing offers a natural advantage: data never leaves the device.

Agent tasks often involve sensitive information such as meeting minutes, personal knowledge bases, and office documents. When processed in the cloud, compliance risks are significantly amplified. Running agents on-device ensures that data remains in a local闭环 from input to output, achieving physical-level data security and compliance—a prerequisite for the wide range of practical applications of Agent Computers.

Huang Renxun has also repeatedly emphasized the importance of security. The global AI industry has recognized that for agents to become widespread, security is a necessity.

By 2026, the adoption rate of AI PCs can already be measured by substantial market data. Gartner predicts that global AI PC shipments in 2026 will reach 143 million units, accounting for 55% of the entire PC market, indicating that AI PCs may soon surpass traditional PCs as the dominant choice.

The Chinese market moves faster and has become the core engine driving the market. IDC predicts that although overall PC shipments in China are expected to decline by 0.8% in 2026, AI PC shipments will surge by 146.5% year-over-year, with a five-year compound annual growth rate of 58.7%, potentially accounting for 36.5% of the overall PC market by 2029.

At the operating system level, support for local compute is also being actively developed. Microsoft Windows 11 has integrated numerous AI features through ongoing updates, while domestic OS vendors such as Kylin are beginning to incorporate local Agent capabilities.

The entire industry chain, from chips and hardware to operating systems and agent applications, is preparing for natively agent-based PCs.

02. Do the Math on Tokens: How Important Is Edge Computing Power?

Hash rate determines whether you can run, while token cost determines where it’s most profitable to run.

This issue is also reshaping the entire commercial logic of AI computing as agents are set to be widely deployed by 2026. At GTC 2026 in March, Jensen Huang introduced token economics, categorizing token services into five tiers:

The free tier is designed to attract users; the basic tier costs approximately $3 per million tokens, serving regular users; the advanced tier costs approximately $6 per million tokens, offering larger models and faster speeds; the high-speed tier costs approximately $45 per million tokens, supporting long context and deep reasoning; the premium tier costs approximately $150 per million tokens, tailored for ultra-long research tasks and real-time responses on critical paths.

He did the math: a researcher using 50 million tokens per day, at a rate of $150 per million, is an acceptable cost for a research team.

Tokens are not a one-time purchase; they are consumed continuously as long as the AI is running. When AI agents are fully deployed, a single enterprise AI application can easily incur a monthly token bill of hundreds of thousands of dollars.

In March 2026, Alibaba established the Token Hub business group, with CEO Wu Yongming personally leading it, demonstrating that token management has indeed evolved from a technical issue into a business strategy. Currently, several domestic cloud service providers have already adjusted or are in the process of adjusting their API invocation pricing, with the cost per million tokens for certain models increasing multiple times in the short term.

It is foreseeable that tokens will not only serve as a unit of billing but also directly exchange for scarce business resources.

The business logic behind edge-side computing becomes clear here: a one-time investment to purchase AI PC hardware eliminates token fees for every subsequent basic inference—this promise is undoubtedly compelling.

Agents can multiply token consumption, turning the theoretical zero marginal cost advantage on the edge into reality. A commonly cited comparison is that the hardware cost of a high-end AI PC is approximately RMB 10,000 to 20,000, whereas a team frequently calling cloud APIs may exceed this amount in token costs within just a few months.

In the industry, some people summarize the boundary between local and cloud-based inference as three lines.

The first is model size: models with 120B parameters or fewer can already be run locally. The second is security and confidentiality: scenarios involving privacy and sensitive data must be processed locally. The third is commercialization: for Agent scenarios with high-frequency Token usage, local inference completely avoids cloud-based pay-per-use pricing.

Based on these three lines, a conclusion is forming: in the future, 80% of reasoning scenarios will be localized.

This conclusion is supported by an increasing amount of evidence. According to Omdia data, a distributed architecture that dynamically schedules workloads across end, edge, and cloud environments—by handling 80% of lightweight tasks locally—based on a benchmark of 50 AI requests per user per day at a typical cost of $0.003 per request, can reduce the annual cloud cost for 100 million users from $5.5 billion to $1.2 billion, saving over $4.3 billion.

For businesses and agent application developers, this is a number that cannot be ignored; for individual users, on-device computing power further lowers the barrier to using AI. Daily tasks involving agent capabilities for established reasoning workflows and stable processes no longer require purchasing expensive cloud computing quotas or worrying about receiving a massive bill at month-end. Once you buy a device, AI capabilities are already ready locally.

Based on token economics, the demand for edge computing power has begun to gain widespread validation.

For example, NVIDIA released the RTX Spark PC superchip for Windows, and major OEMs such as Dell, Lenovo, HP, ASUS, and Acer are all included in the initial product lineup. A key selling point of these products is local AI execution without consuming cloud token quotas.

Domestic manufacturers have also moved swiftly. The launch of the Great Wall N90 Pro represents a concrete market initiative in this round of on-device computing deployment. Powered by the M50, a mass-produced compute-in-memory chip, a 35B model runs smoothly locally. This means that high-frequency Agent commands from users consume tokens entirely on-device, incurring no cloud invocation fees.

Houmo Manjie M50 chip, Licheng LQ50 M.2 card

In other words, with support from the operating system and AI applications, the daily inference cost of a Great Wall N90 Pro after purchase is virtually zero.

Thus, edge-side computing has undergone a significant reassessment of its value in the Agent era. Once often regarded as a cost-effective alternative to cloud computing, it has now become an indispensable layer of infrastructure in the evolving compute architecture driven by rising Token consumption.

Jensen Huang has compared tokens to oil in the digital world; in this context, edge computing power acts like distributed energy nodes with their own oil fields—no pipelines required, yet capable of independently meeting all local users’ needs.

When oil prices continue to rise, the value of owning an independent oil field becomes evident.

03. How high can domestic full-machine acceleration reach?

Currently, although global AI PC technological influence is still dominated by overseas giants such as NVIDIA and Microsoft, domestic all-in-one solutions, which began at nearly the same time, are quietly transitioning from following to parity.

The mass production and delivery of the Great Wall N90 Pro signifies more than just the launch of an Agent Computer product—it represents a comprehensive end-to-end validation of a domestically developed technology stack.

Chinese market PC users have consistently shown higher adoption of AI applications, greater demand for daily work and productivity improvements, and more immediate sensitivity to data privacy and inference latency—all of which provide a logical foundation for the advantages of edge-side computing to be amplified in the domestic market.

Local AI’s low latency, zero risk, and personalized experience have rekindled PC replacement demand that was severely impacted by mobile office trends. As a result, the competitive strategy among domestic PC manufacturers has also shifted during this AI PC upgrade wave.

For a long time, the core narrative around domestic PCs revolved around security and controllability, or cost-effective alternatives. The emergence of AI PCs has changed how product formats are defined. Today, security is no longer a standalone selling point but is bundled as an inherent attribute of edge-side computing power.

Behind this transformation, domestic edge-side chips are proving themselves with practical, deployable products. For example, a 1L AI mini workstation equipped with the same M50 chip as the Great Wall N90 Pro achieves a computing density of 640 TOPS/L using four M50 chips, enabling immediate deployment and execution of mainstream local large models like Qwen3.6 right out of the box; the ultra-compact AI host P7, weighing only 300g and consuming a maximum of 30W, can support the local deployment of models with hundreds of billions of parameters.

These figures are at the top tier on a global scale.

After testing the Great Wall N90 Pro, some users also remarked that it is "the fastest AI PC they've ever seen—faster than many models running on large desktop GPUs."

You don’t need to use rhetoric about domestic alternatives to prove yourself—your product itself is the best answer.

Domestic PC manufacturers also have their own methodology for technology selection. For example, Great Wall prioritizes three practical criteria when choosing edge AI chips: whether the power consumption suits laptop scenarios, whether the product is already in mass production, and whether the chip vendor is willing to perform deep hardware-software co-optimization.

These three points directly address the three core questions surrounding the commercialization of edge AI chips: Can it be integrated? Can it be reliably supplied? Can the experience be optimized together?

It was revealed that the collaboration between Great Wall and Houmo Intelligence took about a year from product planning to test integration, with an additional six months required to move from initial testing to mass production. Only through extended joint debugging could the product’s stability and performance be fully realized.

As a domestic chip manufacturer, Houmo Intelligence aims to help domestic computers enter the top tier of the global AI PC market. While NVIDIA’s RTX Spark is expected to launch in autumn 2026, Agent Computers powered by the M50 have already achieved mass production and delivery ahead of this timeline.

Therefore, in the industrial chess game of AI PCs, domestic technology stacks have not become followers but have instead been consistent leaders, steadily progressing through three phases—from functional to user-friendly to intelligent.

The available solution addresses whether domestic software and hardware can be compatible; usability involves continuously refining the experience to ensure smooth performance. In the intelligent phase, AI capabilities become the core defining feature of the product, and the Great Wall N90 Pro stands at this pivotal moment of transition from “usable” to “intelligent.”

2026 is widely regarded as the Year of the AI Agent. NVIDIA’s RTX Spark has set the performance benchmark for global AI PCs, while the mass production and delivery of Chinese system solutions offer another narrative thread:

In the new frontier of Agent computers, China’s domestic technology stack represents an independent, viable, and rapidly accelerating path. From chips to operating systems to complete systems, China’s industrial chain is now fully capable of delivering AI Agent computers.

Two paths serve different markets, but ultimately converge toward the same goal: making Agent a reliable, accessible, and affordable infrastructure for everyone.