Tensordyne Claims 13x Throughput Boost Over Nvidia's GB300 NVL72 AI Rack

iconCryptoBriefing
Share
AI summary iconSummary

Tensordyne, a startup with offices in Sunnyvale and Munich, announced its Napier (TDN) AI inference processor on June 15, claiming its TDN72 rack-scale system delivers 13 times higher throughput in tokens per second and 17 times more tokens per watt than Nvidia’s GB300 NVL72 rack. The comparison benchmark: DeepSeek-R1 inference workloads.

The numbers behind the claim

Tensordyne says a single rack running its hardware can churn out roughly 363,000 tokens per second. The company pegs Nvidia’s equivalent rack at approximately 27,400 tokens per second on the same workload.

The secret sauce is something called a logarithmic number system, or LNS, executed directly in hardware. Instead of doing math the way conventional chips do (multiplying big floating-point numbers together), LNS converts multiplication into addition, which is dramatically cheaper in terms of transistors and energy. It’s a technique that’s been studied in academia for decades but has historically been too impractical for production silicon.

Advertisement

Tensordyne built its Napier chip on TSMC’s 3nm process node, integrating both SRAM and HBM memory on-package. The full rack configuration stacks four pods of 72 chips each, totaling 288 chips, with a target power envelope of roughly 120 kW for the entire rack. That’s air-cooled, not liquid-cooled.

The company developed its high-speed scale-up interconnect in collaboration with Broadcom and HPE Juniper. Broadcom contributes silicon development expertise, and HPE Juniper provides data center interconnect capability.

Production timeline and demand signals

Tensordyne says it has accumulated over $200 million in letters of intent and evaluations. Volume production is targeted for mid-2027, with initial shipments expected in late 2026.

The company’s pitch to customers is that each rack could generate tens of millions of dollars more in annual revenue compared to an equivalent Nvidia deployment.

Why this matters for the AI hardware market

Inference workloads are more predictable than training and can be optimized for specific model architectures. By focusing exclusively on inference rather than competing across the full training-and-inference stack, Tensordyne sidesteps Nvidia’s strongest competitive advantages.

The 3nm TSMC process choice puts Tensordyne on roughly the same manufacturing node as Nvidia’s upcoming chips, meaning the performance gap, if real, comes from architectural innovation rather than a process node advantage.

Investors should watch for third-party benchmark validation, which should arrive around the time of initial shipments in late 2026.

Disclaimer: The information on this page may have been obtained from third parties and does not necessarily reflect the views or opinions of KuCoin. This content is provided for general informational purposes only, without any representation or warranty of any kind, nor shall it be construed as financial or investment advice. KuCoin shall not be liable for any errors or omissions, or for any outcomes resulting from the use of this information. Investments in digital assets can be risky. Please carefully evaluate the risks of a product and your risk tolerance based on your own financial circumstances. For more information, please refer to our Terms of Use and Risk Disclosure.