AI compute supply chain bottlenecks are shifting from GPUs to power and cooling

Author: qinbafrank

In February, in “What Does This War of Capital Expenditure Mean?”, we discussed how key segments in the computing power supply chain—such as chips, packaging and testing, storage, and optical modules—continue to capture the greatest value; those with capacity that cannot be rapidly expanded and those with extremely high moats will benefit from the surge in capital expenditure.

There is still significant room for efficiency optimization: techniques such as distillation, quantization, MoE, specialized chips, liquid cooling, and fusion (long-term) could potentially reduce energy consumption and cost per unit of computing power by another 10 to 100 times. Seek opportunities in these areas.

Recently, multiple investment banks—including Morgan Stanley, JPMorgan Chase, Bank of America, Goldman Sachs, UBS, Citigroup, Bernstein, and HSBC—released updated reports on AI, semiconductors, power, and storage. The bottleneck in AI hardware has expanded beyond just GPU supply to a collective strain across five dimensions: power, chips, storage, equipment, and materials.

The demand for AI has surpassed all previous forecasts in traditional power planning, semiconductor equipment capacity, storage pricing models, and robotics installation assumptions.

Morgan Stanley’s global thematic research review highlights that global weekly large language model token consumption surged from 6.4 trillion to 22.7 trillion over three months—an increase of 2.5 times—while the U.S. faces a 55-gigawatt power gap for data centers between 2025 and 2028. JPMorgan’s initial coverage of debt financing for high-performance computing in data centers directly cites a funding shortfall of 122 gigawatts over the next five years; U.S. five-year power planning has jumped from 101 gigawatts to 230 gigawatts, with 44% of new projects facing grid connection wait times exceeding four years. In Bank of America’s latest target price report on Alphabet, capital expenditures for 2026 were raised directly to $181.5 billion, doubling year-over-year, while free cash flow declined by 62%. These three sets of data are not outputs from a single framework but independent assessments from three separate institutions following distinct research pathways.

The evolution of bottlenecks in the semiconductor supply chain—particularly in the AI computing domain—has followed a clear sequential progression: from “computing (GPUs) → storage (HBM, etc.) → optical interconnects → power / liquid cooling.” This is the industry consensus for 2025–2026. As AI training and inference clusters scale from single racks (tens of GPUs) to hyperscale deployments (thousands to hundreds of thousands of GPUs), resolving each bottleneck immediately exposes the next physical or supply chain constraint, creating a “Leontief-style” complementary constraint (where the absence of any one component prevents shipment).

Optical module

It is important to understand why this evolution has occurred, the current state of affairs, and the underlying physical/engineering reasons:

1. First-phase bottleneck: GPU computing (dominant from 2022–2024) Core limitation:

The wafer capacity of high-end GPUs (such as NVIDIA Hopper H100 → Blackwell B200 → Rubin) plus advanced packaging.

Why it’s a bottleneck: AI large models require massive parallel computing, and TSMC’s 4nm/3nm/2nm logic processes combined with CoWoS (2.5D/3D packaging) capacity once became the biggest constraint. Even if there’s sufficient front-end wafer supply, if the back-end cannot keep up with packaging logic chips together with HBM stacks, entire GPUs cannot be produced.

Situation relief: TSMC is significantly expanding CoWoS capacity (doubling production in 2024–2025), and NVIDIA Blackwell has been shipped at scale. However, this only unlocks the “compute” bottleneck, immediately exposing new issues downstream.

2. Second-stage bottleneck: Storage (HBM high-bandwidth memory, expected to be the most scarce in 2024–2025)

Core constraint: HBM3/HBM3e/HBM4 production capacity.

Why the relay becomes a bottleneck: While GPU computing power has increased, model parameters have exploded—reaching trillions or even tens of trillions of parameters—making data movement (memory bandwidth) the "memory wall." HBM can transfer several terabytes of data per second, more than 20 times faster than conventional DDR memory. Because HBM is positioned close to the logic chip, data doesn’t need to travel far, significantly reducing energy consumption.

Each B200 GPU requires 192GB+ of HBM3e; the total HBM capacity per rack (NVL72) has reached 30–40 TB, with bandwidth demands far exceeding those of traditional DRAM.

Current supply chain status: Only SK Hynix, Samsung, and Micron can produce HBM at scale; the process is complex (involving TSV and stacking). All HBM production for 2025 has been sold out, and demand will still exceed supply in 2026, with prices surging 246% year-over-year. Even when GPU chips are ready, the absence of HBM prevents assembly and delivery, causing delays in the deployment of entire AI clusters.

Result: Storage has shifted from a commodity to a strategic bottleneck, with storage accounting for up to 30% of capital expenditures.

3. Third-stage bottleneck: Optical interconnects (transition underway in 2025–2026)

Core limitation: Physical constraints of copper cables (NVLink/NVSwitch) in terms of bandwidth, distance, power consumption, and weight.

Why optical is inevitable: Within a single rack (72 GPUs), copper cables may suffice, but when scaling to multiple racks or connecting thousands of GPUs, copper suffers severe signal attenuation (effective distance <1 meter at 1.8 TB/s bandwidth), excessive weight (over 5,000 copper cables in an NVL72 rack, totaling 1.36 tons), and high power consumption (replacing copper with pluggable optical modules adds an extra 20 kW). Signal integrity, latency, and thermal management can no longer support larger clusters.

Solution: Transition to optical interconnects (CPO—Co-Packaged Optics—and silicon photonics). Integrate the optical engine directly beside the GPU/ASIC and use fiber optics for scale-out, achieving higher bandwidth density, lower power per bit, and longer transmission distances.

Optical module

NVIDIA is making a major bet at the 2026 GTC, having invested in optical companies, leading to explosive demand for 800G/1.6T optical modules. Lite, Broadcom, Coherent, Ayar Labs, and others have become new winners.

Current status: Copper cabling has reached its limit; optical interconnects are transitioning from an option to a necessity, breaking through the performance ceiling of AI data centers.

Stage 4 bottleneck (current cutting edge): Power + Liquid Cooling (becoming the ultimate physical constraint starting in 2026). Core limitations: Power wall, thermal wall, and grid connectivity.

Why it’s the ultimate bottleneck: Each GPU’s power consumption has risen from 300W to 700–1200W, causing single rack power demands to surge from 10–20 kW (in the CPU era) to 120–200 kW or even higher. Traditional air cooling has a physical upper limit of only 20–50 kW, with noise, airflow, and energy consumption becoming unacceptable.

Power side: Data centers require gigawatt-level power supply; grid connection waiting lists can extend for years, and delivery cycles for equipment such as transformers and solid-state transformers have stretched to 100 weeks. Microsoft’s CEO once bluntly stated, “We have GPUs but no power to plug them in.”

Liquid cooling side: Must transition to Direct-to-Chip (D2C) liquid cooling or immersion cooling, combined with technologies such as microfluidics and cold plates. TSMC has demonstrated silicon-based liquid cooling on the CoWoS platform, supporting >2.6 kW TDP. Liquid cooling and thermal management providers such as Vertiv (VRT) are becoming the new core of infrastructure.

Cascading effect: PUE (Power Usage Effectiveness) requirements of less than 1.2 have made waste heat recovery and grid integration of nuclear or new energy sources into new topics. Even if all previous steps are resolved, server racks cannot be deployed or operated without electricity and cooling.

Optical module

The underlying logic of the shift in bottlenecks within the AI computing supply chain: AI computing is not a "single-point" issue, but a system-level Leontief production function—GPU, HBM, interconnects, power, and cooling must all be matched to the weakest link. Each time a hyperscaler (such as Google, Microsoft, Meta, etc.) resolves one bottleneck, it immediately redirects capital and innovation to the next环节.

As of 2026, the industry is in a transition phase marked by the accelerated adoption of optical interconnects alongside the large-scale commercialization of power and liquid cooling. While new bottlenecks may emerge in the future—such as in lasers, optical fiber materials, or power grid transformers—the sequence of “compute → storage → optical → electrical/cooling” has become an industry-recognized pathway.

This also explains why the investment thesis has shifted from NVIDIA/TSMC to the three HBM leaders (SK Hynix, etc.), optical manufacturers (Lumentum, Coherent), and liquid cooling/power infrastructure providers (Vertiv, related power companies).

Each bottleneck shift is reshaping the value distribution across the entire semiconductor and data center industry chain.