Hewlett Packard Enterprise Boosts Private Cloud AI Token Throughput by 20%

Hewlett Packard Enterprise announced updates to its Private Cloud AI platform on March 16, co-engineered with Nvidia, that deliver up to a 20% improvement in token throughput for AI inference tasks. New network expansion racks will allow the platform to scale to 128 GPUs, with availability slated for July 2026.

What’s actually changing

Token throughput is how many chunks of text (or other data) an AI model can process per second. A 20% jump means enterprises running generative AI or agentic AI workloads get meaningfully faster responses without swapping out hardware.

The platform now supports Nvidia RTX PRO 6000 Blackwell Server Edition GPUs, specifically designed for enterprise data center deployments rather than the workstation or consumer market.

Scaling to 128 GPUs through the new expansion racks allows enterprises to run bigger models or serve more concurrent users. For organizations that started small with Private Cloud AI and need to grow, this removes what was previously a hard constraint.

HPE is also adding air-gapped deployment options, meaning the entire system can operate completely disconnected from external networks, addressing the needs of defense contractors, healthcare systems, or financial institutions handling regulated data.

The platform ships as a turnkey solution bundling HPE’s server and storage hardware with Nvidia AI Enterprise software, which includes NIM inference microservices. Small-form-factor options are also part of the updated lineup.

The bigger picture: why enterprises are going private

HPE and Nvidia first started rolling out Private Cloud AI around mid-2024, with the product accumulating a series of updates expanding GPU support, improving performance benchmarks, and adding deployment flexibility.

Sky Co. is one notable customer that deployed HPE Private Cloud AI for secure on-premises AI operations as of June 2026.

HPE AI Essentials software is bundled alongside Nvidia AI Enterprise in the offering, giving customers a software stack that handles model deployment and monitoring.

What this means for investors

The competitive landscape includes Dell with its own AI factory offerings and Lenovo pushing into enterprise AI infrastructure. Cloud providers are also responding by offering reserved GPU instances with more predictable pricing.

The July 2026 availability for the expansion racks means meaningful revenue from the 128-GPU configurations likely won’t show up in HPE’s financials until late 2026 at the earliest.