ByteDance, Alibaba, Tencent Expand Model Aggregation in Cloud Services

ByteDance's Volcengine Ark Coding Plan has officially launched GLM-5.1, with the official statement noting "fully aligned with the original manufacturer's full capabilities, no purchase restrictions." Previously, Volcengine's Coding Plan had long offered only older models such as GLM-4.7. This update not only introduces GLM-5.1 but also integrates multiple latest domestic large models, including Minimax M2.7, Kimi k2.6, and DeepSeek-V3.2.

Kimi

This means developers can access multiple leading models with just one subscription fee. Market feedback indicates that this “bundle model” significantly reduces developers’ trial-and-error costs. Currently, the Lite plan costs 40 yuan per month, and the Pro plan costs 200 yuan per month, encouraging many developers to “purchase early to secure their spot.”

Zhipu's GLM-5.1 already demonstrated impressive engineering capabilities in an update in early April 2026. In two official videos released by Zhipu—“Building a Linux Desktop from Scratch in 8 Hours” and “655 Iterations Increase Vector Database Query Throughput to 6.9 Times the Initial Production Version”—the public’s perception of large models’ ability to “effectively execute in 8 hours” has been fundamentally redefined.

Journalist visits developer community; most users report "not durable"

Journalists entered an Ark Coding developer community and found that, alongside posts sharing user experiences, many users reported significant discrepancies between expectations and reality. Scrolling through a few pages of the community reveals numerous complaints and requests for refunds, with many users openly stating, “I feel scammed.”

Kimi

There are mainly two points of contention:

One issue concerns the rapid depletion of limits. A user named "Hakimi" posted that "after a few rounds of tasks, the 5-hour limit is almost used up." Another user shared that their 5-hour limit was triggered because their account continuously scrolled through windows over a 5-hour period, resulting in over 6,040 actual requests, exceeding the system limit.

Kimi

Second, degraded user experience due to strain on compute resource scheduling. Many users reported encountering 429 errors (too many requests) and, during peak hours, “first-byte delays of over one minute are common.” One user stated directly: “The five-hour rate limit is triggered too frequently—it’s impossible to use for serious development.”

Meanwhile, behind Coding Plan’s low monthly price of 40 yuan lies a hidden complexity regarding the “one call” definition, which carries different deduction multipliers. For instance, a user posted an image in a developer community showing varying deduction multipliers for different models: the Doubao series and Qwen series have a multiplier of 1, the DeepSeek series has a multiplier of 2, and the MiniMax-M2.7, Kimi-K2.6, and GLM-5.1 series each have a multiplier of 5.

Kimi

This also reveals that building a "model supermarket" is not as easy as it seems. While developers are drawn in by the cost-performance ratio, initial shortcomings in areas such as computing resource scheduling have caused many to hesitate after trying it out. This highlights the early growing pains of the "packaged model" approach. As user numbers grow, the computing platform's capacity is being tested. Finding a sustainable balance between low pricing and service quality will be a long-term challenge for VolcEngine and other entrants.

Cloud providers collectively shift toward "model marketplaces," with initial layering and standardization emerging.

This "integrated" update of VolcEngine Coding Plan is not an isolated event.

Since early 2026, major cloud providers such as Alibaba Cloud, Baidu Intelligent Cloud, and Tencent Cloud have been advancing their multi-model integration strategies. For example, Alibaba Cloud, as an industry pioneer, was among the first to launch a multi-model subscription plan called "Bailian Coding Plan," which currently supports models including the Qwen series, Kimi-K2.5, GLM-5, and MiniMax-M2.5. The Pro plan is priced at 200 RMB per month; the Lite plan ceased new purchases starting March 20 and stopped renewals and upgrades starting April 13.

Kimi

Tencent Cloud's large model Coding Plan subscription service will be fully updated in March 2026, supporting multiple latest models including Tencent HY 2.0 Instruct, GLM-5, Kimi-K2.5, and MiniMax-M2.5. Baidu Qianfan officially launched its AI coding subscription service, Coding Plan, in February 2026, making it one of the earliest cloud providers in China to offer such a service.

The "model supermarket" model is no longer just one company's choice—it is becoming a competitive frontier that cloud providers are vying to enter. But beneath the surface of cloud providers' aggregation strategies, the new core of competition lies in who can deliver more stable services, more transparent quota rules, more flexible disaster recovery mechanisms, and broader enterprise-grade capabilities beyond programming—along with whether they can maintain strong renewal rates.

Internationally, model aggregation platforms such as Amazon Bedrock and Microsoft Azure differ from domestic Coding subscription models, but both reflect the same integration trend.

Kimi

Overall, industry competition is shifting from a focus on individual model capabilities to a competition centered on platform integration and ecosystem service capabilities, leading to a rapid increase in industry concentration.

Wang Kai, Chief Asset Allocation Analyst at Guoxin Securities, told reporters that although industry differentiation is accelerating, it may be premature to judge this as a consolidation phase. “More accurately, this is a refinement and iteration of industrial chain specialization—model providers focus on algorithms, while cloud providers focus on engineering delivery, each leveraging their core strengths.” He believes that regardless of whether other cloud providers follow suit, the competitive landscape will evolve from individual competition toward ecological differentiation.

Are large model companies facing increased pressure to "pipeline"?

所谓“管道化”，并非指模型公司消失，而是指它们丧失了产品溢价、用户连接权和话语权，利润向算力平台转移，沦为“被支配”的角色。

Under the tide of aggregation by cloud providers, "commoditization" has become the sword of Damocles hanging over independent large model companies. In this silent struggle, leading players such as Zhipu AI, Moonshot AI (Kimi), and MiniMax have not chosen passive compromise; instead, they have grown organically from their core, carving out distinct paths to breakthrough.

At an open dialogue on April 8, Zhipu AI CEO Zhang Peng clearly stated that Zhipu’s ultimate goal has never been to become a “disposable, interchangeable API tool,” but rather to build fully autonomous agents. This positioning aims to elevate Zhipu from a “model provider” to a “task executor,” thereby avoiding the price trap of pure API pipelines.

Moonshot (Kimi) adopts a strategy of "distributed deployment + deep focus on long-form text." It integrates synchronously with major cloud platforms such as Volcano Engine and Alibaba Cloud, enabling multi-source computing power supply, avoiding dependency on a single provider, and ensuring service stability and cost control. The Kimi K2.6, launched in April 2026, employs a Mixture of Experts (MoE) architecture with a standard context window of 256K tokens.

MiniMax is focusing its core resources on vertical domains such as content creation, intelligent customer service, education, enterprise services, and entertainment socializing, with particular emphasis on scenarios like game AI, digital humans, and multimodal interactions, to build customized capabilities that cloud platforms cannot replicate.

Kimi

Will platform integration by big tech companies accelerate the "commoditization" of model companies? Wang Kai, Chief Asset Allocation Analyst at Guoxin Securities, believes it's necessary to distinguish between short-term and long-term perspectives.

“In the short term, distribution channels are controlled by platforms, pricing power is partially ceded, and profit shifts from model developers to entry points—that’s a business norm,” he said. “But in the long term, general-purpose models are prone to homogenization; deep learning models for vertical domains like finance, healthcare, and law cannot have their professional barriers erased simply through centralized aggregation.”

In addressing platformization risks, one can also draw insights from OpenAI and Anthropic’s strategies: on one hand, strengthen direct channels to end users, as the independent operation of ChatGPT and Claude aims to build user connections that bypass platforms; on the other hand, the speed of technological iteration and user brand recognition serve as two effective moats, so model companies must balance investment in R&D with productization strategies.

The ultimate outcome of this "pipeline versus platform" contest may not be one side consuming the other, but rather a clearer division of labor: cloud providers focus on pipelines, model companies focus on technology, and both gradually define their own boundaries of survival through this dynamic.

As for who will consume whom, it is still far from the end of the story at this stage.

This article is from the WeChat official account "Science and Technology Innovation Board Daily," authored by Wang Nai.