ME News reports that on May 14 (UTC+8), according to monitoring by Beating, Nous Research has unveiled a new pretraining method for large models called Token Stacking Training (TST). This approach compresses adjacent tokens into bundles during the early stages of training, reducing pretraining time by 2 to 3 times under the same computational budget. TST consists of two phases: during the first 20% to 40% of training, the model no longer processes tokens individually; instead, it averages adjacent tokens into bundles and predicts which tokens will appear in the next bundle (without preserving internal order). Subsequently, the model reverts to conventional next-token prediction. Since the underlying architecture remains unchanged, the resulting model behaves identically to standard models during inference. The method has been validated on MoE models with up to 10 billion parameters. At its core, this technique trades data for compute—accelerating training by consuming training data more rapidly. However, if high-quality text corpora become scarce in the future, this data-intensive nature could become a limitation. Additionally, hours after the paper’s release, readers noted that TST’s mechanism bears strong resemblance to a 2024 publication titled “Beyond Next Token Prediction.” The research team subsequently acknowledged on Hugging Face that this was an “unfortunate case of convergent research” and pledged to update the paper with proper citations. (Source: BlockBeats)
Nous Research's TST Training Method Sparks Controversy Over Similarity to Previous Work
KuCoinFlashShare






On May 14 (UTC+8), Nous Research unveiled a new token launch alongside its Token Stacking Training (TST) method, claiming it reduces pre-training time by 2 to 3 times under the same computational load. The method stacks adjacent tokens during early training and predicts token packages rather than individual tokens. Critics quickly observed TST’s similarity to the 2024 paper "Beyond Next Token Prediction." The team acknowledged the overlap as "unfortunate convergent research" and pledged to include proper citations. New token listings often attract scrutiny, and this case is no exception.
Source:Show original
Disclaimer: The information on this page may have been obtained from third parties and does not necessarily reflect the views or opinions of KuCoin. This content is provided for general informational purposes only, without any representation or warranty of any kind, nor shall it be construed as financial or investment advice. KuCoin shall not be liable for any errors or omissions, or for any outcomes resulting from the use of this information.
Investments in digital assets can be risky. Please carefully evaluate the risks of a product and your risk tolerance based on your own financial circumstances. For more information, please refer to our Terms of Use and Risk Disclosure.