Cursor Multi-Agent System Optimizes 235 NVIDIA GPU Operators in Three Weeks, Approaching Hardware Limits

iconKuCoinFlash
Share
Share IconShare IconShare IconShare IconShare IconShare IconCopy
AI summary iconSummary

expand icon
On-chain news: On April 15 (UTC+8), the AI programming tool Cursor announced a collaboration with NVIDIA using its multi-agent system. Over three weeks, the system optimized 235 real-world GPU operators from 124 open-source models across 27 Blackwell B200 GPUs, achieving a 38% geometric mean speedup. RWA news: 149 operators (63%) outperformed baselines, with 45 (19%) demonstrating over 2x acceleration. Key improvements included 84% faster BF16 grouped query attention and 39% faster NVFP4 MoE layer operations. Cursor noted GPU resource constraints and plans to integrate the multi-agent technology into its core product.

ME News reports that on April 15 (UTC+8), according to monitoring by Beating, the AI programming tool Cursor disclosed its multi-agent system collaboration experiment with NVIDIA. The system autonomously operated for three weeks on 27 Blackwell B200 GPUs, tackling 235 real-world operator optimization problems extracted from over 124 production-grade open-source models including DeepSeek, Qwen, and Gemma. It generated and optimized GPU operator code from scratch, achieving an overall geometric mean speedup of 38%. GPU operator optimization is among the most challenging domains in software engineering, requiring engineers to master chip architecture, assembly-level instructions, and memory scheduling; a high-performance operator typically takes senior experts months or even years to refine. Cursor’s multi-agent system handled all 235 problems simultaneously: one planning agent assigned tasks and dynamically scheduled them based on performance metrics, while multiple working agents optimized in parallel. The system autonomously invoked NVIDIA’s SOL-ExecBench benchmarking pipeline to form an automated “test-debug-optimize” loop, with zero human intervention. The system ran two rounds using two different languages: CUDA C (with inline PTX assembly) to test raw hardware inference capability, and CuTe DSL to test its ability to learn new APIs rarely present in publicly available training data. Of the 235 problems, the system outperformed baselines on 149 (63%), with 45 (19%) achieving over 2x speedup. Three representative results: 1. BF16 Grouped Query Attention (extracted from Llama 3.1 8B inference scenario): 84% faster than the manually optimized FlashInfer library, with an SOL score of 0.9722—nearly reaching the hardware’s theoretical limit (perfect score: 1.0). 2. BF16 Matrix Multiplication: The automatically generated operator achieved 86% of NVIDIA’s hand-tuned cuBLAS performance and outperformed the baseline by up to 9% in small-M scenarios common in LLM decoding. 3. NVFP4 Linear Operations in Mixture-of-Experts Layers (extracted from MoE models such as Qwen3): The system autonomously identified bottlenecks in 4-bit floating-point quantization and applied targeted fusion optimizations, achieving a 39% speedup. Cursor acknowledged that the overall median SOL score was only 0.56, indicating significant room for improvement—primarily due to limited GPU resources (27 GPUs shared across all 235 tasks). Cursor stated that these multi-agent technologies “will be integrated into core products very soon.” An IDE company’s AI agent has now approached the performance of top human experts in assembly-level GPU optimization—far surpassing the narrative of “helping you write application code.” (Source: BlockBeats)

Disclaimer: The information on this page may have been obtained from third parties and does not necessarily reflect the views or opinions of KuCoin. This content is provided for general informational purposes only, without any representation or warranty of any kind, nor shall it be construed as financial or investment advice. KuCoin shall not be liable for any errors or omissions, or for any outcomes resulting from the use of this information. Investments in digital assets can be risky. Please carefully evaluate the risks of a product and your risk tolerance based on your own financial circumstances. For more information, please refer to our Terms of Use and Risk Disclosure.