According to Beating Monitoring, Alibaba’s Tongyi Qianwen has officially released its new flagship agent foundation model, Qwen3.7-Max. Official real-world data shows that, without access to any chip architecture documentation or performance analysis data, the new model achieved a 10.0x improvement in Triton operator performance on the domestic Pingtouge Zhenwu M890 processor during a fully autonomous kernel optimization task lasting 35 hours and involving 1,158 tool calls. During optimization, the model underwent five core evolutionary stages. First, it partitioned the prefix KV-cache along the token dimension using Split-K to fully utilize all 36 SM cores. Next, it replaced host-device synchronization via cudaMalloc with pre-allocated PyTorch variables and eliminated all synchronous cudaMemcpy calls used to query prefix lengths by leveraging tensor metadata, completely removing host-device communication overhead. In the final stage, the model restructured the operator to process all four query tokens within a single thread block, sharing memory loads to amortize memory access costs, achieving a critical architecture-specific refinement. Real-world operator optimization results show that Qwen3.7-Max achieved a 10.0x geometric mean speedup, significantly outperforming GLM 5.1 (7.3x) and Kimi K2.6 (5.0x). In contrast, DeepSeek V4 Pro achieved only a 3.3x speedup and prematurely terminated the task after five consecutive rounds with no tool calls. To master generalizable problem-solving strategies in dynamic environments, Qwen3.7-Max decoupled tasks, execution frameworks, and validators during training and employed cross-framework reinforcement learning to avoid shortcut overfitting to specific benchmarks. On the general-purpose agent benchmarks MCP-Mark (60.8) and SpreadSheetBench (87.0), Qwen3.7-Max demonstrated exceptional generalization ability, with overall performance closely approaching that of Claude-4.6-Opus-Max.
Qwen3.7-Max Achieves 10x Performance Improvement on Domestic Chip in 35-Hour Optimization Task
MarsBitShare






Alibaba's Qwen3.7-Max has been launched as the new flagship base model for intelligent agents, achieving a 10.0x performance improvement on the Pingtouge Zhenwu M890 processor during a 35-hour optimization task. The model surpassed GLM 5.1 and Kimi K2.6 using on-chain news data without any chip architecture information. Key enhancements included splitting the prefix KV-cache and restructuring the operator. The results underscore the potential of AI-driven on-chain news for delivering significant performance gains.
Source:Show original
Disclaimer: The information on this page may have been obtained from third parties and does not necessarily reflect the views or opinions of KuCoin. This content is provided for general informational purposes only, without any representation or warranty of any kind, nor shall it be construed as financial or investment advice. KuCoin shall not be liable for any errors or omissions, or for any outcomes resulting from the use of this information.
Investments in digital assets can be risky. Please carefully evaluate the risks of a product and your risk tolerance based on your own financial circumstances. For more information, please refer to our Terms of Use and Risk Disclosure.