Moonshot's K2.6 Release and Price Increase Viewed as Pre-IPO Preparation

Article by Xiangxianzhi

The night before last, Moonshot released Kimi K2.6 and increased the API input price from $0.60 to $0.95 per million tokens.

Increased by 58%. First price increase since the launch of the K2 series.

But it seems no one is paying attention to this.

Four months ago, in an internal letter dated the last day of 2025, Yang Zhilin wrote that Moonshot AI “is not in a rush to IPO.” At that time, both Zhipu and MiniMax had already submitted their prospectuses to the Hong Kong Stock Exchange—clearly a deliberate effort to distinguish their positioning.

He also wrote in that letter that the company’s cash reserves exceed $1.4 billion and that its Series C round was oversubscribed by $500 million—implying that the potential in the primary market has yet to be fully realized, and there’s no rush in the secondary market.

Three months later, Bloomberg reported that he began reaching out to CICC and Goldman Sachs. Three weeks after that, K2.6 launched.

Someone who dislikes rushing completed the very thing they previously said they wouldn’t do—in just four months.

K2.6 This is certainly not the final product release before Moonshot's listing. However, this version launch marks Yang Zhilin's first roadshow following Moonshot's planned listing.

Kimi has never released a model like this before.

Kimi used to have a set routine for releasing models.

Release technical reports, open-source weights, and climb the Hugging Face leaderboard, then wait for scrutiny from the technical community. K1.5 targets o1’s reasoning methodology, with technical details exceeding benchmark numbers; K2 Thinking directly uploads weights to Hugging Face, letting developers run their own tests. These moves are clearly aimed at developers and researchers.

The messaging follows the standard tech community approach: we solved a specific problem, our method is superior, and we welcome replication.

The action of K2.6 is different.

First, let’s discuss the price increases. In RMB terms, the input price per million tokens is 6.5 yuan for K2.6 (cache miss), and 4 yuan for K2.5. The output price has increased from 21 yuan to 27 yuan. The cache hit price is 1.1 yuan.

This is a structured price increase. While all tiers appear to be rising, the cache-hit tier has the smallest increase—from 0.7 yuan to 1.1 yuan, or $0.16 per million tokens in USD.

This $0.16 is the key to understanding this price increase.

For enterprise users who consistently use the same system prompt, the prefixes for Code Assistant, Agent Orchestration Framework, and Intelligent Customer Service are highly reused, achieving cache hit rates of 75% to 83%. Moonshot has set pricing for these customers to be nearly equivalent.

This price increase fully impacted occasional users with varying prompts each time.

This is a friendly price adjustment for businesses already integrated with Kimi, and an unfriendly one for individual users still comparing prices. The former are the “enterprise locked-in customers” featured in the IPO story; the latter are the “long-tail users” never mentioned in pitch decks. Moonshot AI knows exactly who its valuation assets are.

The compute architecture of the Agent era differs from that of the conversation era. Conversation models involve dozens of tokens in round-trip interactions, whereas Agents require thousands of tool calls and hundreds of thousands of tokens consumed. In K2.6’s official case studies—locally deploying the Qwen3.5 model on a Mac to perform over 4,000 tool calls over 12 hours, reconstructing the open-source matching engine exchange-core with over 1,000 tool calls in 13 hours, and even more extreme, autonomously running monitoring and alerting with incident response for five days—single-task token consumption in these scenarios is hundreds to thousands of times greater than in K2.5-era conversation use cases.

Of course, this case is intended to demonstrate long-range reasoning capabilities, but with the addition of K2.6’s 300-agent cluster, the token consumption will certainly be enormous.

At the old price of $0.60, a single call for this Agent task could result in a loss. At $0.95, it barely covers the inference cost.

So price increases aren't a sign of confidence—they're a necessity. Moonshot has raised a total of $2.5 billion, with $1.4 billion in cash reserves from its Series C to C+ rounds. But if the next-generation K3 truly scales to 3–4 trillion parameters, a single pretraining run could consume half of that.

Without price increases, gross profit margins in the final quarters before listing will appear weak. The prospectus must disclose gross margin data.

The Dark Side of the Moon

This could have been openly stated—the Agent era requires a new pricing model. But Moonshot didn’t say it. Because end users have just come from the free era of K2 Thinking, telling them “I’m raising prices” now isn’t a good product story.

It’s a story told to another audience—Kimi already has a group of enterprise customers who can’t do without it, and they’ll pay more to use it. (Like myself.)

The second thing is benchmarking. K2.6’s official reference models are GPT-5.4, Claude Opus 4.6, and Gemini 3.1 Pro—all of which are the previous generation’s flagship models.

That same week, Anthropic released Claude Mythos, and Opus 4.7 had just launched—both are a generation stronger than Opus 4.6. K2.6 does not directly compete with them.

This is actually a deliberate choice. Compared to Mythos, K2.6 is positioned as a "follower"; compared to Opus 4.6, K2.6 falls into the "leading tier." A $18 billion valuation requires the latter.

Kimi didn’t used to do things like this. When K2 Thinking was released, the official team ran full benchmarks outright, publishing all results—both strong and weak—so developers could judge for themselves. That’s how the tech community operates: the community understands your strengths and weaknesses, and is willing to accept a model with clear shortcomings but a well-defined direction.

No, the roadshow PPT. The roadshow PPT needs a conclusion that a fund manager can understand within 30 seconds: “Comparable to or better than top international closed-source models.” This sentence is the original text from the K2.6 official blog.

The third thing is the Agent cluster and open-source dual-track system. K2.6 upgraded a feature called Claw Groups—an heterogeneous Agent ecosystem where Agents with different devices, models, and toolchains operate within a shared collaboration space, with K2.6 serving as the orchestrator. 300 sub-Agents run in parallel, collaborate over 4,000 steps, and operate autonomously for five days.

These figures are intended for enterprise customers, not developers. To a developer, “300 agents running in parallel” has no practical meaning—he wouldn’t run 300 agents in a local project. This configuration only makes sense for one type of customer: large enterprises that need an agent matrix to automate end-to-end operations.

The story being referenced is that of Salesforce, not Hugging Face.

Meanwhile, K2.6 is fully open-sourced. Yang Zhilin said at the Zhongguancun Forum on March 26 that open-sourcing will be an absolute victory.

Open-source plus enterprise-grade agent clusters—this sits halfway between DeepSeek and Anthropic, equally embracing both models. It sounds like a compelling story. But claiming both sides means proving yourself on both fronts.

The capital market doesn’t care whether these questions have answers—it only requires that every line has a story.

Price hikes, benchmarking, and agent clusters—when considered together, they share an unusual commonality: none of them are for the tech community.

In the past, Kimi's underlying strategy was: if developers like me, enterprise customers will eventually follow, and the capital market will inevitably follow too. This approach has a name: technical sincerity.

K2.6 No more waiting. The price increase is a direct assertion of B-side pricing power; positioning against GPT-5.4 is an early move to secure valuation; agent clusters and Claw Groups serve as showrooms for enterprise service narratives.

Each point corresponds to a question on the pitch deck: What is your commercialization capability? Where do you stand in comparison to competitors? What is your B2B moat?

The compression from Preview to GA into just 8 days follows the same logic. Previous versions of the K2 series underwent 2–3 months of preview periods to allow the community sufficient time for testing, feedback, and iteration. K2.6 didn’t give itself that buffer—not because the technology matured faster, but because the window of opportunity won’t wait.

IPO in the second half of 2026; according to HKEX procedures, 4 to 6 months are required for filing, inquiries, hearings, roadshows, pricing, and the cooling-off period. Launching roadshows in September means the product must be ready by April.

No GA will be issued in April, and there will be no further windows afterward.

K3 is the real finale.

But K2.6 is not the strongest card Moonshot can play.

The official blog includes a restrained statement: K2.6 is “the runway for K3.”

12-hour long-context encoding, a 300-agent cluster, and a context compressor—these are not the final forms of the K2 series, but rather execution-layer infrastructure that a larger foundational model can support. Moonshot will not invest effort in making this system operational unless it is certain that a much larger model will consume these capabilities.

Earlier, rumors about K3 surfaced on Reddit, targeting a parameter scale of 3 to 4 trillion. Compared to the trillion-scale K2 series, this represents a foundational leap.

If K3 can launch before the roadshow window, that’s the real answer. K2.6 laid the runway—K3 takes off.

The question is whether it’s possible to catch up. How long does it take to train a 3-4 trillion parameter model? Both GPT-5 and Claude Opus 4.6 have approximate pre-training cycles of 6 to 9 months, followed by several additional months for post-training and safety evaluations. Given Moonshot’s current computing capacity—based on its partnership with Alibaba Cloud and existing cash reserves—can this timeline be compressed to 5 to 6 months?

This bet was placed on K2.6.

Eight days from preview to GA, scaling the Agent cluster from 100 to 300 in one go, extending long-running tasks from hundreds of steps to 4,000—each move compresses time and creates space for K3’s potential.

If K3 can be launched before August or September, it will be the grand finale of the roadshow.

If we miss the deadline—K3 becomes a model that can only be launched after listing, and K2.6 must carry the entire valuation narrative alone.

The Dark Side of the Moon bets that it can be done.

What is the $18 billion valuation anchored to?

Back to valuation.

Three months ago, Moonshot's valuation was $4.3 billion; two months ago, it was $5.5 billion; now it is $18 billion.

It’s not that the Dark Side of the Moon became four times stronger over the past three months. Rather, after Zhipu and MiniMax went public, their valuations surged fourfold, raising the entire sector’s ceiling. Zhipu’s market capitalization on the Hong Kong stock exchange is HK$305 billion, and MiniMax’s is HK$309.2 billion—both surpassing SenseTime’s historical peak market value.

The market cap logic for these two companies isn't about "what the next-generation technology can achieve," but rather "how much AI assets can be valued at in the Hong Kong stock market."

The $18 billion valuation of Moonshot is anchored in the same thing: it is no longer about proving itself as the strongest Chinese AI company, but about proving that it is a Chinese AI company that can be valued.

K2.6 All actions—price increases, benchmarking, Agent clusters, and open-source dual-track—respond to this proposition.

But one thing K2.6 has not yet proven: Will Kimi’s C-end users be willing to pay for the price-increased K2.6? Will paying subscribers migrate to DeepSeek or MiniMax? Among enterprise customers, how many are actually running Claw Groups, and how many have only signed POCs?

These numbers are what investors will definitely ask about during the roadshow. K2.6 can only present the product right now. Whether it can become numbers depends on the next three months.

When Zhipu went public, it submitted a prospectus showing it had not yet turned a profit; MiniMax did the same. Investors accepted this narrative because the broader story of “Chinese AI assets” had just begun to emerge. Moonshot arrived half a year later. With the same issue, Zhipu and MiniMax can say, “We’re validating,” but Moonshot must say, “We’re monetizing.”

All of this pressure falls within the three months between K2.6 and K3.

So, back to the original question—is K2.6 the final roadshow before Moonlight’s listing?

No.

If K3 catches the roadshow window, it is the true climax. K2.6 merely paves the way for it. If K3 misses the roadshow window, K2.6 must carry the entire listing narrative on its own—that would be Yang ZhiLin’s forced debut presentation.

None of those outcomes were what Yang ZhiLin wanted four months ago.

But everything that happened over these four months—Zhipu MiniMax’s listing, the ceiling on its valuation being raised, and the narrowing window of opportunity—has forced someone who dislikes haste to act quickly.

K3 was launched as the second round.