Google Launches 8th-Gen TPU with Separate Training and Inference Chips

ME News reports that on April 22 (UTC+8), according to monitoring by Beating, Google CEO Sundar Pichai unveiled the eighth-generation TPU at Cloud Next 2026, splitting training and inference into two separate chips for the first time. The TPU 8t is designed for training. A single super-node can connect 9,600 TPU chips, delivering 121 ExaFlops of computing power and 2PB of shared high-bandwidth memory—three times the performance of the previous Ironwood generation and up to twice the energy efficiency. Inter-chip connectivity bandwidth has doubled, and combined with the newly introduced Virgo network topology, up to one million chips can be linked into a single logical cluster, enabling near-linear scalability. Google aims to reduce the development cycle for cutting-edge models from months to weeks. The TPU 8i is designed for inference. A single pod connects 1,152 TPU chips, equipped with 288GB of high-bandwidth memory and 384MB of on-chip SRAM—three times that of Ironwood—to keep active model data resident on the chip. The new Boardfly network topology significantly reduces latency; Google claims it can serve nearly twice as many customers at the same cost, targeting support for millions of agents running simultaneously. Both chips are hosted on Google’s custom Arm-based Axion CPU, paired with fourth-generation liquid cooling. They are scheduled for official release on the Google Cloud AI Hypercomputer platform later in 2026, offered alongside NVIDIA GPU instances. (Source: BlockBeats)