
Edit | Ze Nan
In the AI era, this is how you stack GPUs?
Elon Musk’s xAI currently has GPU utilization of only about 11%. Related reports indicate that the optimization of its AI software stack has been unsatisfactory. Recent coverage by The Information has drawn significant attention.

Currently, xAI operates approximately 550,000 NVIDIA GPUs across its Memphis and Colossus data center clusters, including H100 and H200 models, some of which are equipped with liquid cooling. Although these GPUs belong to the previous generation (preceding the latest Blackwell series), their scale is already astonishing.
Despite having such a vast inventory of GPUs, xAI’s Model FLOPs Utilization (MFU) is only 11%. To use an imperfect analogy, among the 500,000 GPUs already installed in xAI’s servers, the actual usable compute power is equivalent to only about 60,000 GPUs. What exactly is causing such low efficiency?
For smaller-scale deployments (e.g., 1,000–10,000 GPUs), coordination between multiple nodes is typically not an issue. However, as server scale continues to expand and hundreds of thousands of GPUs need to be integrated, idle time across devices rapidly accumulates, causing overall utilization to plummet. A series of inconsistencies within the software stack arising from this are now being exposed in xAI’s actual operations.
In a supercluster, the GPU chips themselves compute rapidly, but the bottleneck lies in the data read/write speed of high-bandwidth memory (HBM) and the communication overhead between thousands of servers. Even minor delays or network congestion in data transmission can force all GPUs in the cluster to idle, waiting for data to be loaded.
On the other hand, training AI models is typically intermittent. GPUs operate at full capacity during actual computations, but many devices remain idle while researchers analyze training results, adjust parameters, or process data pipelines.
Although 11% is clearly a low figure, The Information’s report also reveals some industry norms in AI: compute waste is widespread, and some researchers at large companies deliberately run meaningless training tasks to inflate utilization metrics, fearing criticism from management or the loss of their idle GPU quotas to other teams.
To be honest, doing this is also to protect the team's own GPU quota.
Of course, this is not a challenge unique to xAI; it is a structural issue prevalent across the entire AI industry—achieving efficient operation of AI infrastructure at such a massive scale is an extremely daunting challenge.

The optimization skills required to run AI cloud infrastructure encompass data, algorithms, models, computation, kernels, interactions (human-AI-world, between agents), and global optimization—all of which present extremely high engineering complexity.
Some tech giants have optimized their large-scale infrastructure stacks to achieve utilization rates exceeding 40%. Meta and Google are prime examples, with GPU utilization rates of 43% and 46%, respectively.
The challenges faced by xAI demonstrate that in today's AI arms race, acquiring GPUs is only the first step—using them effectively is what truly matters. Hardware scale has surpassed the scheduling capabilities of existing software architectures.
However, xAI is already addressing this issue and has set a target utilization rate of 50%. Although there is no definitive timeline yet, its core improvements will focus on optimizing infrastructure and the software stack. As future workloads gradually migrate to hardware platforms specifically designed to meet the demands of “agentic AI,” xAI is highly likely to offer its large GPU cluster for rental services.
Elon Musk is also seeking transformation by betting on his in-house computing project, "TeraFab": on one hand, he is advancing multiple in-house chips to be integrated into xAI’s “AI chip family”; on the other hand, Musk aims to leverage Intel’s 14A process technology to develop cutting-edge solutions for xAI, SpaceX, and other related businesses.
xAI's predicament reminds all followers: in the second half of the AI race, what matters may no longer be who can buy more GPUs.
Reference content:
https://www.theinformation.com/newsletters/ai-agenda/xai-shows-hard-use-lot-gpus
This article is from the WeChat public account "Machine Heart" (ID: almosthuman2014), authored by someone focused on AI infrastructure.
