DeepSeek Reduces API Costs by 100x, Sparking Debate on AI Infrastructure Centralization

—— Starting from Gonka's talk at LA Hacks 2026

On April 26, DeepSeek launched new pricing for its V4 series APIs: input cache hit prices across all models dropped to one-tenth of the launch price, and with additional limited-time discounts for the Pro version, the cost to process one million tokens fell to just 0.025 yuan—nearly 100 times cheaper than a year ago. On the same day, China’s A-share computing power sector surged en masse, igniting market enthusiasm.

But behind the cheers, there is an issue no one is directly addressing: as models become cheaper, the computing power required to run them is becoming increasingly centralized.

Data doesn't lie. In the fourth quarter of 2025, the combined capital expenditures of Microsoft, Amazon, Meta, and Google increased by 64% year-over-year to $118.6 billion; total capital expenditures for 2026 are projected to rise a further 53% year-over-year to $570.8 billion. Google also raised its 2026 TPU chip shipment target by 50% to 6 million units. The delivery lead time for NVIDIA’s H100 series has stretched to several months in some markets.

Pricing power at the model layer is shifting toward developers, but control at the compute layer is consolidating even faster among a few giants. This is a hidden yet profound contradiction of the AI era.

Gonka

Against this backdrop, on April 24, 2026, Gonka Protocol co-founders Daniil and David Liberman took the stage at LA Hacks 2026. This year’s flagship event at UCLA, the largest annual university hackathon, featured the Liberman brothers as keynote speakers, addressing hundreds of top engineers on the verge of entering the industry. The question they posed resonated with particular clarity at this moment: Is it still possible to achieve decentralized compute?

I. The Other Side of the Price Cut Wave

The pricing reduction of DeepSeek V4 appears to stem from efficiency gains driven by technological advances—the new attention mechanism reduces token dimensions, and when combined with DSA sparse attention, significantly lowers demands on computation and memory. However, for this price reduction to be sustainable, it relies on the premise that computing power somewhere is sufficiently abundant and inexpensive.

The reality is that this “sufficient” source of computing power is rapidly consolidating among a small number of nodes globally. Not long ago, Lumentum’s CEO Michael Hurlston stated that, based on current trends, the company’s production capacity through 2028 is nearly fully pre-sold. This is not an isolated challenge for one company, but rather a collective strain across the entire AI infrastructure supply chain in the face of rapidly growing demand.

Daniil used a simple yet powerful contrast in his LA Hacks talk: the computing power of the Bitcoin network now exceeds the combined total of Google, Microsoft, and Amazon’s cloud data centers—but what is this power actually doing? Solving a hash puzzle that no one needs an answer to. The same is true for the world’s idle GPU power: graphics cards in gamers’ machines, servers in university labs, and spare capacity from small and medium cloud providers—collectively massive, yet unusable for AI inference due to the lack of a coordinating mechanism.

Gonka aims to solve exactly this coordination problem—using the incentive mechanism of proof of work to organize globally dispersed idle GPUs into a network capable of handling real AI inference tasks.

Two: Reasoning is the new battlefield

DeepSeek's price reduction has sparked widespread discussion on "AI democratization" on Chinese internet platforms. But one overlooked detail: the reduction applies to "invocation costs," not "computing power costs." As AI applications scale, the volume of inference requests is growing exponentially—industry projections estimate that by 2026, inference will account for about two-thirds of global AI computing power consumption.

What does this mean? Each time the cost per invocation decreases by an order of magnitude, the total required computing power actually increases, not decreases. The "democratization" of large models, to some extent, accelerates the centralization of the computing power layer—because only players with massive computing resources can sustain the operation of inference services under ultra-low profit margins.

This is a forming structural lock: whoever controls the physical compute power on the inference side controls the true infrastructure gateway of the AI era. From this perspective, the significance of decentralized compute networks goes beyond mere cost optimization of “50% cheaper”—it offers a structural alternative before centralized lock-in is complete.

III. A Real Challenge to Young Builders

Participants in LA Hacks—engineers and product builders from California’s top universities—will soon face an unromantic engineering decision: on which layer of compute to build their products.

Which server does your AI product use for inference?

Do you have the ability to migrate if that platform adjusts its pricing or access policies?

Are you building user scale to create value for yourself or to supply the platform with leverage?

Developers have already experienced these issues in the Web2 era: when an application’s fate is deeply tied to platform algorithms or distribution rules, “independence” becomes a term that must be constantly redefined. In the AI era, reliance on computing power will replicate the same logic at the infrastructure level, and because switching costs are higher, lock-in effects will only be stronger.

Gonka

Hackathons, as a format, contain an inherent irony: building a working product in 36 hours with minimal resources and maximum speed—this is precisely the state that decentralized network incentive mechanisms strive to achieve. As Daniil took the stage at LA Hacks, he wasn’t just presenting Gonka; he was asking this audience: Are the things you’ll build in the future accelerating this centralized trend, or are you creating new possibilities?

Four: PoW 2.0: An Engineering Challenge

Gonka realigned the proof-of-work incentive structure from hash computation to AI inference, ensuring that nearly 100% of the network's computational contributions directly correspond to real-world tasks. This mechanism has a critical engineering requirement: AI inference tasks must be verifiable and reproducible—given the same model weights, the same random seed, and the same input, any node must be able to reproduce the computation result and validate its correctness. This is the core engineering challenge that enabled Gonka to transition from an academic prototype to a functional network.

From an economic perspective, the significance of this mechanism lies in the fact that token value is naturally anchored to the cost of physical computing power, rather than liquidity sentiment. Miners who contribute computing power are rewarded, while developers who invoke computing power pay fees, creating an incentive loop that does not rely on the goodwill of any intermediary.

Of course, technical feasibility is only part of the story. The harder question is: in an era where computing power demand is skyrocketing and major players are spending capital expenditures in the tens of billions of dollars, can a distributed computing network built on community-driven contributions scale to pose a real competitive threat?

Early data from Gonka provides a reference point: within less than a year of mainnet launch, the network's aggregated computing power expanded from 60 H100 equivalents to over 10,000, driven by spontaneous participation from hundreds of independent nodes globally rather than centralized coordination. This does not prove that scalability issues have been resolved, but it demonstrates that the incentive mechanism effectively fueled early growth.

Five: Issues with the window period

Historically, control over infrastructure has often converged rapidly in the early stages—this was true in the railway era, the internet era, and the mobile internet era as well. Each time, some found a way to insert themselves before standards solidified, while others only realized their participation had been significantly narrowed after centralization was complete.

Where is AI computing infrastructure currently in its development? Looking at the projected $570.8 billion in capital expenditures from the four major cloud providers by 2026, centralization is accelerating; however, from developers’ actual usage patterns, there remains a significant amount of untapped and poorly integrated resources on the supply side. This gap is the structural space where decentralized networks can exist.

Daniil cited a parallel in his speech: after the dot-com bubble burst in 2000, what remained was not rubble, but a global network of fiber-optic cables that supported the digital economy for the next two decades. After the current boom in AI infrastructure investment subsides, the compute protocols and incentive mechanisms that endure will become the infrastructure of the next cycle—the question is, which protocols have a foundation strong enough to remain operational under pressure?

This is not a question about any specific project, but rather a challenge the entire decentralized AI sector must confront: Can governance design truly resist erosion by single points of control? Do incentive mechanisms remain effective at scale? Is the decentralization of the compute network genuinely upheld across three dimensions—technical execution, token issuance, and upgrade decision-making?

Conclusion

DeepSeek's price reduction has reignited the narrative of "AI democratization." But democratizing inference calls and democratizing compute infrastructure are two different things. The former is already happening; whether the latter can happen depends on how many people over the next few years truly treat it as an engineering problem worth solving, rather than just a compelling story.