Hewlett Packard Enterprise Boosts Private Cloud AI Token Throughput by 20%

iconCryptoBriefing
Share
AI summary iconSummary

Hewlett Packard Enterprise announced updates to its Private Cloud AI platform on March 16, co-engineered with Nvidia, that deliver up to a 20% improvement in token throughput for AI inference tasks. New network expansion racks will allow the platform to scale to 128 GPUs, with availability slated for July 2026.

What’s actually changing

Token throughput is how many chunks of text (or other data) an AI model can process per second. A 20% jump means enterprises running generative AI or agentic AI workloads get meaningfully faster responses without swapping out hardware.

The platform now supports Nvidia RTX PRO 6000 Blackwell Server Edition GPUs, specifically designed for enterprise data center deployments rather than the workstation or consumer market.

Scaling to 128 GPUs through the new expansion racks allows enterprises to run bigger models or serve more concurrent users. For organizations that started small with Private Cloud AI and need to grow, this removes what was previously a hard constraint.

Advertisement

HPE is also adding air-gapped deployment options, meaning the entire system can operate completely disconnected from external networks, addressing the needs of defense contractors, healthcare systems, or financial institutions handling regulated data.

The platform ships as a turnkey solution bundling HPE’s server and storage hardware with Nvidia AI Enterprise software, which includes NIM inference microservices. Small-form-factor options are also part of the updated lineup.

The bigger picture: why enterprises are going private

HPE and Nvidia first started rolling out Private Cloud AI around mid-2024, with the product accumulating a series of updates expanding GPU support, improving performance benchmarks, and adding deployment flexibility.

Sky Co. is one notable customer that deployed HPE Private Cloud AI for secure on-premises AI operations as of June 2026.

HPE AI Essentials software is bundled alongside Nvidia AI Enterprise in the offering, giving customers a software stack that handles model deployment and monitoring.

What this means for investors

The competitive landscape includes Dell with its own AI factory offerings and Lenovo pushing into enterprise AI infrastructure. Cloud providers are also responding by offering reserved GPU instances with more predictable pricing.

The July 2026 availability for the expansion racks means meaningful revenue from the 128-GPU configurations likely won’t show up in HPE’s financials until late 2026 at the earliest.

Disclaimer: The information on this page may have been obtained from third parties and does not necessarily reflect the views or opinions of KuCoin. This content is provided for general informational purposes only, without any representation or warranty of any kind, nor shall it be construed as financial or investment advice. KuCoin shall not be liable for any errors or omissions, or for any outcomes resulting from the use of this information. Investments in digital assets can be risky. Please carefully evaluate the risks of a product and your risk tolerance based on your own financial circumstances. For more information, please refer to our Terms of Use and Risk Disclosure.