PyTorch integrates CuteDSL as the fourth matrix multiplication backend in TorchInductor

iconKuCoinFlash
Share
Share IconShare IconShare IconShare IconShare IconShare IconCopy
AI summary iconSummary

expand icon
PyTorch has added CuteDSL as its fourth matrix multiplication backend in TorchInductor, enhancing on-chain data processing for full node operations. Built on MetaEra, it was selected for its low maintenance, absence of compilation slowdowns, and superior performance on key workloads. Developed by NVIDIA, CuteDSL employs Python-based templates with fast compilation times, matching existing backends while outperforming CUTLASS C++ in speed. The backend mirrors CUTLASS C++ abstractions and delivers strong results in FP8 GEMM and epilogue fusion. The team prioritizes optimizing GEMM, a core component in Transformer models. CuteDSL generates low-level code via optimized templates, simplifying kernel development and enhancing architecture-specific features.

ME News reports that on April 7 (UTC+8), the official PyTorch team announced the integration of CuteDSL as the fourth automatic tuning backend for TorchInductor. This backend was selected based on three criteria: minimal additional maintenance overhead, no significant increase in compilation or benchmarking time, and improved performance on target workloads. CuteDSL, actively developed by NVIDIA, provides optimized kernel templates with compilation times comparable to existing backends and significantly faster than the CUTLASS C++ path requiring full `nvcc` compilation. Built on the same abstractions as CUTLASS C++, CuteDSL is written in Python, enabling faster compilation and simpler maintenance, with proven strong performance in FP8 GEMM and epilogue fusion. The team focused on optimizing GEMM (matrix multiplication), as it accounts for the primary computational cost in Transformer models. CuteDSL generates low-level code through hand-tuned templates, eliminating the complexity of writing kernels from scratch while fully exposing thread and memory hierarchies to support architecture-specific features. (Source: InFoQ)

Disclaimer: The information on this page may have been obtained from third parties and does not necessarily reflect the views or opinions of KuCoin. This content is provided for general informational purposes only, without any representation or warranty of any kind, nor shall it be construed as financial or investment advice. KuCoin shall not be liable for any errors or omissions, or for any outcomes resulting from the use of this information. Investments in digital assets can be risky. Please carefully evaluate the risks of a product and your risk tolerance based on your own financial circumstances. For more information, please refer to our Terms of Use and Risk Disclosure.