ByteDance open-sources Cola DLM: A diffusion model for text generation

iconKuCoinFlash
Share
Share IconShare IconShare IconShare IconShare IconShare IconCopy
AI summary iconSummary

expand icon
On May 16 (UTC+8), ByteDance's Seed team open-sourced Cola DLM, a diffusion model for text generation based on MetaEra. The model integrates a Text VAE and block-causal DiT to generate text by first organizing high-level semantics. The 2B-scale open-source version contains 23 billion total parameters and demonstrates strong performance across eight benchmarks. It remains a research checkpoint and is not a dialogue model, as it lacks instruction fine-tuning or RLHF. As liquidity and crypto markets continue to evolve, such models may enhance CFT (Countering the Financing of Terrorism) efforts through improved content screening and fraud detection.

ME News reports that on May 16 (UTC+8), according to monitoring by Beating, ByteDance’s Seed team has open-sourced Cola DLM—a continuous latent diffusion language model designed to bypass the fixed left-to-right token-by-token generation path of large language models, instead restructuring text generation to first organize high-level semantics before refining them into specific words. The core of Cola DLM consists of a Text VAE and a block-causal DiT. The Text VAE first maps discrete text into a continuous latent space, after which the block-causal DiT learns the latent prior via Flow Matching, and finally, a conditional decoder reconstructs the latent variables back into text. The diffusion process operates on latent semantic representations rather than iteratively denoising at the token level. This open-sourced version is a 2B-class model, comprising approximately 2.3 billion total parameters: 1.8 billion in the core DiT and 500 million in the VAE. According to the paper, under a unified generative evaluation protocol, Cola DLM demonstrates scaling performance competitive with same-sized AR/LLaDA baselines across eight benchmarks—including LAMBADA, MMLU, OBQA, HellaSwag, RACE, SIQA, SQuAD, and Story Cloze—and achieves the highest average score. However, it remains a research checkpoint and is not a ready-to-use conversational model. The official documentation states that the model has not undergone instruction fine-tuning or RLHF; its primary purpose is to explore how continuous latent diffusion can be applied to text generation. The paper also presents preliminary experiments extending the model toward unified text-image modeling, but this open-source repository includes only the text pipeline. (Source: BlockBeats)

Disclaimer: The information on this page may have been obtained from third parties and does not necessarily reflect the views or opinions of KuCoin. This content is provided for general informational purposes only, without any representation or warranty of any kind, nor shall it be construed as financial or investment advice. KuCoin shall not be liable for any errors or omissions, or for any outcomes resulting from the use of this information. Investments in digital assets can be risky. Please carefully evaluate the risks of a product and your risk tolerance based on your own financial circumstances. For more information, please refer to our Terms of Use and Risk Disclosure.