SGLang and AMD collaborate to optimize DeepSeek-R1 inference on the MI355X GPU.
KuCoinFlashShare
On-chain news: SGLang and AMD have optimized DeepSeek-R1 inference on the MI355X GPU, achieving a total cost of $0.169 per million tokens at 129 tokens/s/user. This is 5% cheaper than NVIDIA B200 (Dynamo TRT-LLM) and 40% lower than B200 (SGLang). With 24 MI355X GPUs, throughput reached 2,436 tokens/s/GPU—1.25x better than B200 SGLang using 48 GPUs. Key improvements include MoRI mixed FP4/FP8 quantization, MoRI-IO KV Cache, batch overlap with SDMA, ROCm Specv2 MTP, and CPU streaming. Crypto news continues to highlight hardware advancements in AI and blockchain efficiency.
Source:Show original
Disclaimer: The information on this page may have been obtained from third parties and does not necessarily reflect the views or opinions of KuCoin. This content is provided for general informational purposes only, without any representation or warranty of any kind, nor shall it be construed as financial or investment advice. KuCoin shall not be liable for any errors or omissions, or for any outcomes resulting from the use of this information.
Investments in digital assets can be risky. Please carefully evaluate the risks of a product and your risk tolerance based on your own financial circumstances. For more information, please refer to our Terms of Use and Risk Disclosure.