ByteDance open-sources Cola DLM: A diffusion model for text generation

ME News reports that on May 16 (UTC+8), according to monitoring by Beating, ByteDance’s Seed team has open-sourced Cola DLM—a continuous latent diffusion language model designed to bypass the fixed left-to-right token-by-token generation path of large language models, instead restructuring text generation to first organize high-level semantics before refining them into specific words. The core of Cola DLM consists of a Text VAE and a block-causal DiT. The Text VAE first maps discrete text into a continuous latent space, after which the block-causal DiT learns the latent prior via Flow Matching, and finally, a conditional decoder reconstructs the latent variables back into text. The diffusion process operates on latent semantic representations rather than iteratively denoising at the token level. This open-sourced version is a 2B-class model, comprising approximately 2.3 billion total parameters: 1.8 billion in the core DiT and 500 million in the VAE. According to the paper, under a unified generative evaluation protocol, Cola DLM demonstrates scaling performance competitive with same-sized AR/LLaDA baselines across eight benchmarks—including LAMBADA, MMLU, OBQA, HellaSwag, RACE, SIQA, SQuAD, and Story Cloze—and achieves the highest average score. However, it remains a research checkpoint and is not a ready-to-use conversational model. The official documentation states that the model has not undergone instruction fine-tuning or RLHF; its primary purpose is to explore how continuous latent diffusion can be applied to text generation. The paper also presents preliminary experiments extending the model toward unified text-image modeling, but this open-source repository includes only the text pipeline. (Source: BlockBeats)