Volcano Engine has launched the Doubao-Seed-2.1 series, including the flagship Doubao-Seed-2.1-Pro and the lightweight Doubao-Seed-2.1-Turbo, with APIs now available on Volcano Ark. Meanwhile, the video generation model Seedance 2.5 is expected to be officially released in early July, and the audio generation model 1.0 has already opened for invite-only testing, expanding the product line from language models to video and audio generation.
Usage Volume and Market Share Disclosure
At the launch event, Tan Dai, President of Volcano Engine, revealed that, as of June this year, the daily average token usage of the Doubao large model had exceeded 180 trillion, more than ten times higher than last year. The company also stated that Volcano Engine holds a 49.5% market share in China’s public cloud MaaS service market.
This data reflects that domestic enterprises' demand for large model invocation continues to grow rapidly, and also shows that VolcEngine is focusing on production-grade use cases rather than merely demonstrating model capabilities.
Priced for enterprise deployment
The pricing for Doubao Large Model 2.1 Pro is RMB 6 per million tokens for input and RMB 30 per million tokens for output; under cache hit conditions, the input price can be reduced to RMB 1.2. VolcEngine states that, in Coding and Agent scenarios, the comprehensive cost can be lowered to RMB 1.96 per million tokens.
The Turbo version further reduces the price while approaching Pro-level capabilities, making it ideal for high-frequency usage scenarios. VolcEngine also offers the continuously evolving version, Doubao-Seed-Evolving, allowing enterprises to receive subsequent model updates without changing their API endpoint.
Focus heavily on Coding and Agent
VolcEngine is placing special emphasis on programming and agent capabilities this time. The company states that the Doubao Large Model 2.1 Pro performs close to or surpasses international models such as OpenAI's GPT-5.5 and Anthropic's Claude Opus 4.7 in multiple programming and task execution benchmarks.
During the live demonstration, Volcengine presented an RTL case for chip design. The company stated that the model ran continuously for nearly 18 hours, underwent nine rounds of iteration, and successfully generated six core modules and 1,303 lines of RTL code, passing simulation, testing, and synthesis verification processes.
In the Agent scenario, VolcEngine also demonstrated a multi-agent collaboration case. According to their description, a developer orchestrated over 500 agents, triggering thousands of tool calls in total to ultimately construct a large-scale 3D city map.
Multimodal products continue to expand
In addition to the language model, VolcEngine has announced that Seedance 2.5 will be officially released in early July. This video model supports generating single clips up to 30 seconds long and allows up to 50 multimodal inputs combined. The company states that the new version further enhances shot continuity and video editing control.
On the same day, the DouBao Audio Generation Model 1.0 was released, supporting input of text and reference audio to directly generate complete audio content including multi-character dialogues, background music, and environmental sound effects. The model supports up to two minutes of audio generation per request, and its API is currently available by invitation on Volcano Ark.

Office and development tools have been integrated.
VolcEngine states that the Doubao Large Model 2.1 has been integrated with partners such as WPS, Dedao, and Unity, is compatible with development frameworks like Claude Code and Codex, and has launched tools including TRAE and Kozu.

From product timing, pricing, and integration scope, VolcEngine is positioning Doubao for broader enterprise procurement and productivity use cases. As video and audio models roll out, competition among domestic large models is shifting from single-text capabilities to multimodal generation and real-world delivery capabilities.
