Hugging Face Launches Kernels Hub for Pre-Compiled GPU Operators

KuCoinFlash

Release Time: 04/15/2026 04:12:44

Summary

Hugging Face CEO Clem Delangue confirmed the official release of Kernels on the Hub on April 15 (UTC+8), marking a significant on-chain milestone for developers. Kernels provides pre-compiled GPU operators that accelerate inference and training speeds by 1.7 to 2.5 times. Developers can now install these operators with a single line of code, with cloud-based compilation managed by Hugging Face. The Hub automatically matches hardware and delivers files within seconds. Now a top-level repository type, Kernels includes 61 operators for common tasks, supporting NVIDIA CUDA, AMD ROCm, Apple Metal, and Intel XPU. The announcement follows a beta phase that began in June 2025.

ME News reports that on April 15 (UTC+8), according to 1M AI News monitoring, Hugging Face CEO Clem Delangue announced the official launch of Kernels on Hub. GPU kernels are low-level optimization codes that enable graphics cards to achieve peak performance, accelerating inference and training by 1.7 to 2.5 times. However, installation has long been a nightmare: for example, compiling FlashAttention—the most widely used kernel—requires approximately 96 GB of RAM and several hours locally; even minor mismatches in PyTorch or CUDA versions result in errors, causing most developers to get stuck at this step. Kernels Hub moves compilation to the cloud. Hugging Face has pre-compiled kernels across various GPU and system environments; developers只需 write one line of code, and Hub automatically detects the hardware environment, downloading pre-compiled files for immediate use within seconds. Multiple different kernel versions can be loaded within the same process, with full compatibility with torch.compile. Kernels was initially launched in testing in June last year and has now been upgraded this month to a first-class repository type on Hub, joining Models, Datasets, and Spaces. Currently, 61 pre-compiled kernels are available, covering common use cases such as attention mechanisms, normalization, mixture-of-experts routing, and quantization. They support four hardware acceleration platforms: NVIDIA CUDA, AMD ROCm, Apple Metal, and Intel XPU, and have been integrated into Hugging Face’s inference framework TGI and the Transformers library. (Source: BlockBeats)

Source:Show original

Disclaimer: The information on this page may have been obtained from third parties and does not necessarily reflect the views or opinions of KuCoin. This content is provided for general informational purposes only, without any representation or warranty of any kind, nor shall it be construed as financial or investment advice. KuCoin shall not be liable for any errors or omissions, or for any outcomes resulting from the use of this information. Investments in digital assets can be risky. Please carefully evaluate the risks of a product and your risk tolerance based on your own financial circumstances. For more information, please refer to our Terms of Use and Risk Disclosure.