NVIDIA RTX Spark redefines the AI PC with 1 petaflop of AI performance.

Over the past two years, PC manufacturers have repeatedly highlighted one metric when promoting "AI PCs": NPU performance. Yet, whether it's Intel's Lunar Lake at 45 TOPS or AMD's Strix Point at 50 TOPS, these figures have remained at a relatively modest level—sufficient for background blur, voice noise reduction, and running small-scale on-device models, but nothing more.

On May 31, NVIDIA unveiled the RTX Spark superchip at GTC 2026, pushing this figure to 1 petaflop, or 1,000 TOPS—not a 30% or 50% improvement, but a direct leap of an entire order of magnitude.

Also announced at the event were several other updates: Microsoft has enhanced Windows’ native security mechanisms in coordination with RTX Spark and has integrated NVIDIA’s open-source sandbox runtime, OpenShell, onto the Windows platform; Adobe has announced a complete overhaul of Photoshop and Premiere, specifically optimized for RTX Spark’s unified memory architecture; and the first six OEMs have confirmed they will launch thin-and-light laptops and compact desktops featuring this chip this fall.

What NVIDIA did at this year's GTC was not to launch a new chip—it aimed to establish a new hardware standard for the category of "personal AI computers."

When the GPU Becomes the Star of the PC

First, let’s look at the chip itself. According to data disclosed by NVIDIA at GTC, the RTX Spark integrates a Blackwell-architecture GPU with 6,144 CUDA cores, paired with a 20-core Arm-based Grace CPU co-designed with MediaTek, built on TSMC’s 3nm process. The key advancement lies in the memory architecture: up to 128GB of unified memory, allowing the CPU and GPU to share a single memory pool, eliminating the need to transfer data back and forth between them.

This is the opposite of the traditional PC architecture logic.

The traditional PC architecture consists of an x86 CPU as the primary processor and a dedicated GPU as an optional add-on. Even in the recently emerging AI PC concept, Intel and AMD have adopted the approach of integrating an NPU within the CPU as an auxiliary module for AI acceleration, typically offering computing power of around 40 to 50 TOPS. The GPU remains an external component.

RTX Spark has reallocated the spotlight. This SoC elevates the GPU to the lead role, relegating the CPU to a supporting one. NVIDIA claims an AI computing power of 1 petaflop FP4, equivalent to 1,000 TOPS—more than 20 times the NPU performance of the previous generation of AI PCs. This isn't just a speed boost on the same track; it's the starting gun on an entirely new one.

The response speed of OEM manufacturers confirms this assessment. According to NVIDIA’s official announcement and subsequent reports from DIGITIMES, ASUS, Dell, HP, Lenovo, Microsoft Surface, and MSI will launch thin-and-light laptops and compact desktops powered by RTX Spark this fall, with Acer and Gigabyte models to follow shortly after. Nearly all major Windows PC brands are now entering the market.

RTX Spark is not a product born from scratch. In early 2025, the same Blackwell plus Grace core chip was unveiled under the names Project DIGITS and DGX Spark, but at that time it was positioned as a Linux-based desktop supercomputer for developers, with a size comparable to a small desktop PC. A year later, this architecture was shrunk to fit within the thermal constraints of a thin-and-light laptop, the operating system was switched from Linux to Windows, and the target audience expanded from AI developers to general consumers and enterprise users. This is the most significant change in NVIDIA’s consumer-grade release at GTC 2026: NVIDIA is not launching a developer toy—it’s opening the door to the consumer market.

Is running a 120B model locally sufficient?

The numbers for computing power and memory ultimately answer one question: What can it do?

NVIDIA's presentation stated that RTX Spark supports locally running large models with 120 billion parameters, with a context window of up to one million tokens. What does 120B mean? For reference, the current mainstream practice for running local models on consumer-grade hardware involves using an RTX 4090 with 24GB VRAM to run models in the 30B to 40B parameter range through quantization and compression. Smaller models, such as those with 9B parameters, can run quickly on consumer GPUs. The leap from 9B to 120B has redefined what “sufficient” means for edge AI.

128GB of unified memory is the foundation for all of this. In traditional PC architectures, the CPU has its own system memory, and the GPU has its own video memory, with a physical boundary between them. A model larger than the GPU’s video memory either cannot run at all or requires complex model partitioning and memory swapping, leading to a sharp drop in speed. The unified memory architecture eliminates this bottleneck by placing model data directly into a shared 128GB pool accessible to both the CPU and GPU. Apple first demonstrated the consumer-grade feasibility of this approach with Apple Silicon, and now NVIDIA has brought it to the Windows ecosystem.

In addition to large model inference, NVIDIA lists use cases such as 12K video editing, 3D scene rendering exceeding 90GB, and ray-traced gaming at over 100 FPS at 1440p resolution. The common characteristic of these scenarios is the enormous volume of data processed in a single operation—traditional PCs either require several times longer to complete the task or cannot run them at all.

There is still a gap between "supported operation" and "smooth usability." NVIDIA has not disclosed the actual inference speed of the 120B model on RTX Spark, nor has it provided first-token latency data for million-token context scenarios. The key metric determining long-context inference speed is memory bandwidth. For reference, the DGX Spark, which also uses the GB10 core, has been measured to achieve approximately 301 GB/s in memory bandwidth. This bandwidth level is sufficient for the 120B model, but when handling context windows in the millions of tokens, users may need to wait several seconds to see the first output token. The laptop version of RTX Spark may have even lower actual bandwidth due to power constraints.

Add a security cage to the AI agent

Another core announcement beyond computing power is the system-level collaboration between NVIDIA and Microsoft. This aspect may be the most easily overlooked but most impactful content for the industry among GTC 2026’s consumer-focused releases.

A computer capable of running a 120B model, if entrusted to an AI agent that can autonomously operate the desktop, click buttons, and read/write files, presents security risks that go far beyond “whether data might be lost”—they center on “whether the agent might do things you don’t want it to do.” Without resolving this issue, companies cannot deploy such devices to employees.

Microsoft and NVIDIA have provided a two-layer defense. First, Microsoft has enhanced Windows' native security mechanisms to monitor and constrain AI agent behavior at the operating system level. Second, NVIDIA has officially introduced the OpenShell runtime on the Windows platform. According to NVIDIA’s official documentation, OpenShell is an open-source sandbox runtime that provides kernel-level isolation. It defines a controlled operational scope for AI agents, allowing them to autonomously execute tasks within this boundary while strictly limiting their permissions to prevent access to system core files, network connections, or user-sensitive data.

The significance of this combination for enterprise procurement is clear. Prior to this, the concept of a "local AI agent" remained at the stage of technical demonstrations—hardware could run it, but the security framework was nonexistent. No enterprise IT department would dare include such devices on their procurement list in this state. NVIDIA and Microsoft have inserted a standardized isolation layer between hardware and applications, transforming "functional" into "manageable."

The performance overhead of OpenShell is a variable that remains to be observed. Sandbox isolation typically introduces some level of performance loss, but NVIDIA has not yet disclosed specific data on how much this may affect inference speed or system responsiveness. Practical implementation challenges, such as deployment complexity on enterprise IT management platforms and compatibility with existing security policies, can only be validated once OEM devices are available on the market.

Why is Adobe willing to "rebuild from the ground up"?

The level of cooperation from software vendors is typically an indicator of whether a new hardware platform can establish itself successfully.

Adobe's announcements during GTC represent the most significant software-side development in this release cycle. According to NVIDIA's official blog and confirmation from Adobe executives, Adobe has initiated a foundational overhaul of Photoshop and Premiere, specifically optimized for RTX Spark’s unified memory architecture, claiming up to 2x improvements in AI and graphics performance.

"Underlying reconstruction" isn't about adding a plugin or creating an adaptation layer. On traditional PCs, the CPU and GPU each have their own memory spaces; when processing a massive PSD file or an 8K video timeline, data must be repeatedly transferred between these two memory systems—a major source of performance waste. RTX Spark’s unified memory allows the CPU and GPU to directly share a single 128GB space; this structural change delivers tangible value to professional creators' workflows. Adobe’s overhaul of its core code demonstrates that it recognizes this architecture as a lasting direction, not a one-time marketing gimmick.

However, neither NVIDIA nor Adobe has disclosed the benchmark used for this “2x acceleration.” Is it compared to an x86 processor with a discrete GPU from the same generation, or to the NPU solution from the previous generation of AI PCs? The results differ significantly. Until the benchmark conditions are made public, the validity of this figure remains questionable.

Also announced as supported are Blackmagic Design, ComfyUI, llama.cpp, OTOY, and several game companies. The support from ComfyUI and llama.cpp is noteworthy, as they are among the most active open-source tools in current local AI workflows. Early adoption by the developer community often more accurately reflects a platform’s ecosystem potential than commitments from large corporations.

NVIDIA is building an Apple-like integrated hardware-software experience on the Windows platform using its CUDA ecosystem and unified memory architecture. The difference is that Apple’s wall is built by itself, while NVIDIA needs to convince Microsoft and ISVs to build it together. Adobe’s willingness to work from the ground up suggests that at least the first brick of this wall has been laid.

Beyond the paper specifications

Back to the most practical question: Can these devices actually be purchased, and what is the experience like once you have them?

According to NVIDIA's announcement, the first RTX Spark devices will be available this fall, including thin-and-light laptops and compact desktops from ASUS, Dell, HP, Lenovo, Microsoft Surface, and MSI. Models from Acer and Gigabyte will follow later. Specific pricing and exact release dates for all OEMs have not been disclosed.

More critical than pricing are several physical-level unknowns. How can power consumption and thermal management be balanced when packing a 1 petaflop-capable chip into a slim laptop? How does RTX Spark perform in everyday office tasks and battery life outside of AI scenarios? Will the actual bandwidth of 128GB unified memory be noticeably reduced due to power constraints in a laptop form factor?

These issues are the true test of industrial-scale deployment. The peak computing power of a chip on an engineering prototype is often very different from its real-world performance when used by consumers for eight hours a day. NVIDIA emphasized the energy efficiency of RTX Spark during its announcement but did not provide specific TDP figures or battery life data.

From the perspective of the PC industry landscape, the emergence of RTX Spark signals the formation of a new division of labor. For the past three decades, control over core PC chips has been held by x86 processor manufacturers; although GPU manufacturers have grown increasingly important, they have always remained “add-on components plugged into the motherboard.” NVIDIA’s latest offering is a complete SoC, integrating the CPU, GPU, and memory controller into a single chip, with the Arm-based CPU portion designed by MediaTek. The power structure of the PC supply chain is shifting from “x86 CPU with an optional GPU” to “GPU-centric SoC platforms.”

This transition won’t happen overnight. OEM pricing strategies, real-world product energy efficiency, ISV software compatibility progress, and enterprise customer procurement validation cycles—all of these factors determine whether RTX Spark becomes a new benchmark in the PC industry or merely another high-profile, low-impact tech demo. The answer won’t come until at least this fall.