After AI models entered large-scale application, demand for inference computing power continued to rise. Compared to the training phase, the requirements for chip architecture, latency, and deployment costs differ when models generate responses online or execute agent tasks. TechCrunch reports that inference cloud provider General Compute is attempting to enter this market with a lighter deployment solution.
General Compute recently completed a $15 million seed round, with a post-money valuation of $60 million. The round was led by FUSE VC, with participation from Carya Venture Partners and Village Global Ventures. The company positions itself as a "reasoning neocloud," primarily renting out AI processing power required during model inference.
Bet on SambaNova's inference chip
In the AI infrastructure market, GPUs remain the dominant choice, but an increasing number of companies are betting on chips specifically designed for inference workloads. The report notes that General Compute has opted to partner with SambaNova rather than directly competing for scarce GPU resources.
SambaNova is a chip company backed by Intel, long focused on inference computing. A co-founder of General Compute stated that SambaNova’s new chip, launching this year, will offer greater context memory capacity and a more flexible architecture during inference. According to the company, the new chip can process 600 to 700 tokens per second, compared to approximately 250 tokens per second for GPUs.
General Compute has placed an order for $300 million worth of SambaNova SN50 chips and will be the first neocloud company to deploy this batch of chips.
The existing data center can be deployed directly.
In addition to chip supply, another challenge in expanding AI computing power is the deployment of data centers. Many high-performance AI chips require liquid cooling and higher power configurations, which increase the cost of data center upgrades and extend the time to deployment.
General Compute's solution uses air-cooled, lower-power inference chips, allowing devices to be directly deployed in existing data centers without requiring large-scale infrastructure upgrades. For a new entrant in the inference cloud market, this means faster deployment of rentable computing power.
The company is currently advancing托管合作 by placing its own hardware in third-party facilities. Partners include not only traditional data center operators but also cryptocurrency mining companies seeking to transition. The report notes that during certain periods, the cost of Bitcoin production has exceeded market prices, prompting some mining operations to explore new uses for their infrastructure.
The competition in cloud computing is shifting toward speed and cost.
General Compute launched its cloud service last week, claiming it leads in speed when running the open-source large model MiniMax 2.7. The company aims to reduce encoding agent tasks that previously took an hour to just 5 to 10 minutes, and also seeks to lower inference costs for real-time scenarios such as customer service voice agents.
Investor Joe Hassleman believes this partnership resembles CoreWeave’s early expansion of computing power through NVIDIA. For SambaNova, General Compute is a crucial channel for its chips to enter high-growth scenarios.
The report suggests that inference cloud platforms are essentially betting on a market where multiple models and multiple agents coexist. If no single model provider dominates the market long-term, inference speed and unit cost will become more direct competitive metrics. The recent $113 million Series B funding round completed by OpenRouter reflects growing market demand for multi-model integration and token cost optimization.

