According to Beating Monitor, DeepSeek, in collaboration with Peking University, has released a technical report on DSpark, a speculative sampling acceleration framework, and open-sourced the full-stack codebase DeepSpec. DSpark is currently deployed in DeepSeek-V4’s production services. Without compromising output quality, DSpark improves single-user generation speed by 60% to 85% for the Flash version and by 57% to 78% for the Pro version. DSpark outperforms the previous MTP-1 (Single-Token Multi-Branch Prediction) baseline, significantly increasing overall system throughput under strict latency constraints. Previously, multi-token speculative sampling was difficult to deploy in live production environments. Autoregressive draft models were too slow, while parallel draft models suffered from extremely low acceptance rates for the latter portions of long sequences due to independent predictions at each position. Blindly validating multi-token drafts under high concurrency would cause large models to waste substantial computational resources verifying inevitably incorrect tokens, leading to severe system throughput collapse—hence, industry practice has largely been limited to single-token prediction (MTP-1). DSpark overcomes this throughput degradation bottleneck under high concurrency. First, DSpark employs DFlash, a parallel backbone network, to generate hidden states, followed by an extremely lightweight Markov head. The Markov head injects correlations between adjacent tokens at minimal cost through table lookup and a single matrix multiplication. The system also integrates a confidence prediction head and a posterior calibration algorithm. To ensure seamless compatibility with production environments’ zero-overhead scheduling and prevent future information leakage, the scheduler uses an asynchronous mechanism that dynamically determines candidate token pruning length based on predictions from two steps prior, completely preventing large models from validating high-risk tail errors under heavy loads. In addition to DSpark, DeepSeek has open-sourced DeepSpec, a codebase that natively supports open-source large models such as Qwen3 and Gemma. DeepSpec provides a complete Python toolchain covering prompt downloading, large model cache reconstruction, draft model training, and benchmark evaluation. Developers can directly use the open-source scripts to customize and deploy dedicated acceleration modules for different open-source large models locally.
DeepSeek open-sources the DeepSpec framework, boosting the V4 model's speed by up to 85%.
MarsBitShare
DeepSeek has open-sourced the DeepSpec framework and launched the DSpark acceleration system, increasing DeepSeek-V4 speed by up to 85%. The framework enhances the Flash and Pro versions by 60%–78% without compromising quality. DSpark employs DFlash and a lightweight Markov head to reduce rejection rates and improve throughput. DeepSpec supports Qwen3 and Gemma, providing a complete Python toolchain for local deployment. This update introduces new token listings and represents a major token launch event for developers and traders.
Source:Show original
Disclaimer: The information on this page may have been obtained from third parties and does not necessarily reflect the views or opinions of KuCoin. This content is provided for general informational purposes only, without any representation or warranty of any kind, nor shall it be construed as financial or investment advice. KuCoin shall not be liable for any errors or omissions, or for any outcomes resulting from the use of this information.
Investments in digital assets can be risky. Please carefully evaluate the risks of a product and your risk tolerance based on your own financial circumstances. For more information, please refer to our Terms of Use and Risk Disclosure.