Helping businesses burn fewer tokens has become a wildly profitable business.

Author and source: Pencil Road

Helping businesses burn fewer tokens has become a wildly profitable business.

Just now, Together AI announced the completion of an $800 million funding round, with a valuation of $8.3 billion.

It does something simple: enables businesses to use large models—especially open-source large models—more cost-effectively.

This round of financing was led by Aramco Ventures, a subsidiary of Saudi Aramco, with participation from Vista Equity Partners, General Catalyst, Emergence Capital, NVIDIA, and Salesforce Ventures.

The company's last valuation was $3.3 billion; it has now more than doubled.

More importantly, Together AI's annualized revenue has reached $1.15 billion (approximately RMB 7.8 billion).

01 Find business opportunities in open-source models

Together AI was founded in 2022—the same year ChatGPT was launched.

The founding team of Together AI has strong technical expertise, with a significant proportion of Asian members.

Founder and CEO Vipul Ved Prakash was born in New Delhi, India, and studied mathematics, physics, and computer science at St. Stephen’s College in Delhi before leaving to pursue software development. He co-founded the cybersecurity company Cloudmark and the social media search company Topsy. After Topsy was acquired by Apple, he worked at Apple on Siri search and AI-related projects.

CTO Zhang Ce graduated with a bachelor’s degree in mathematics from Peking University in 2008 and earned his Ph.D. from the University of Wisconsin-Madison. He has taught computer science at ETH Zurich and the University of Chicago. His research focuses on making machine learning cheaper, more trustworthy, and more accessible to a broader audience.

The team also includes several other experts.

Chris Re is a professor of computer science at Stanford University and a serial entrepreneur who co-founded SambaNova, Snorkel, and Lattice and Inductiv, both later acquired by Apple, and has had a significant impact in the field of machine learning systems and AI infrastructure.

Tri Dao (Vietnamese) is the Chief Scientist at Together AI and an Assistant Professor in the Department of Computer Science at Princeton University, where he completed his Ph.D. under Chris Ré. He is one of the primary authors of FlashAttention, a research breakthrough that enables Transformers to run faster and use less GPU memory.

Percy Liang (Chinese American) is the director of the Stanford Center for Research on Foundation Models and has long focused on language models, open models, and model evaluation.

The main team members of Together AI, from left to right: Top row—Prakash, Zhang Ce, Chris Ray, Terry Tao; Bottom row—Percy Liang, Zedlewski (Chief Product Officer), Kai Ma (Chief Revenue Officer), Shi Meicheng (Vice President of Finance)

After ChatGPT went viral, the market focused almost exclusively on closed-source large models, but Together AI bet on the other side.

Together AI initially gained attention for providing access to NVIDIA GPUs and has since expanded into a platform that helps developers build and customize open-source AI models.

Together AI customers stay not just for the GPU cards—they need a complete suite of services: model selection, training, fine-tuning, inference, deployment, evaluation, GPU clusters, dedicated endpoints, and cost optimization.

This is the business value of Together AI.

It’s not selling a “smarter AI.” It’s selling the ability to use AI more cheaply, stably, and controllably.

02 Call Token, annual revenue of $1.15 billion

The most notable figure for Together AI is not its $800 million in funding, but its $1.15 billion in annualized order value, with customers including AI-native companies such as Cursor, Cognition, and Decagon.

Together AI's revenue has grown extremely rapidly. In February 2024, its annualized revenue was approximately $30 million; by February 2025, it had surpassed $100 million. This year, it has reached $1.15 billion—an increase of about 38.3 times in two years.

Together AI generates revenue by transforming scattered open-source models into production-ready systems that enterprises are willing to pay for during large model inference.

Including the following types:

First, there is the inference API.

Enterprises are charged per token when calling models. The cost for 1 million tokens varies by model. For example, models such as DeepSeek V4 Pro, MiniMax M3, and Kimi K2.7 Code each have separate pricing for input, cached input, and output.

Second, dedicated inference services.

Some customers with high usage volumes, or those with higher requirements for latency, stability, and security, cannot rely solely on public APIs. They need dedicated endpoints. Together AI’s pricing page also clearly states that many teams start with API calls and migrate to dedicated endpoints as their scale grows.

Third, fine-tuning.

Enterprises don’t just want general-purpose models—they want models that understand their own business, customers, documents, products, and processes. Together AI offers Fine-Tuning services, charged based on the number of tokens processed during training and validation. The pricing page shows that costs vary depending on model size and training method.

Fourth, the GPU cluster.

Some customers still need to train, fine-tune, or deploy models themselves. Together AI also offers GPU capacity billed by the hour. The pricing page shows that its GPU clusters support hardware such as H100, H200, and B200, with charges applied per GPU per hour.

Together AI’s business model combines these elements: earning revenue from enterprises using its models.

Together AI now processes over 400 trillion open model inference tokens per month. A year ago, this number was only around 30 billion. Its volume of requests has grown approximately 13,000-fold in one year.

This business has suddenly grown because companies have begun seriously using AI.

Customer service bots respond to customers daily. AI programming tools generate and modify code daily. Sales systems automatically write emails. Financial systems read files. Medical systems summarize patient records. AI agents repeatedly call models: first understanding the task, then researching information, then invoking tools, then generating results, and finally checking for errors.

Every step costs money.

A few days ago, a story went viral in Silicon Valley: a company spent as much as $500 million on Claude in one month—equivalent to approximately RMB 3.3 billion, or over RMB 100 million burned per day.

The opportunity for open-source models has arrived.

Enterprises don't need the most expensive, most powerful closed-source large models for every task. Often, they simply need a model that is good enough, fast enough, and affordable enough.

On OpenRouter, the percentage of tokens processed by open-source models rose from 34% in January to 65% in June. This is driven by Chinese open-source models narrowing the capability gap with top U.S. models, while offering developers greater freedom for customization and fine-tuning. Some models cost as low as $0.18 per million tokens, compared to an average of about $4 for top-tier models.

Therefore, Together AI’s platform enables businesses to train and run AI workloads on open models such as DeepSeek, MiniMax, and Kimi at a lower cost than closed systems.

Together AI shared a striking statistic: companies using open models typically achieve a 6x to 20x reduction in costs; after migrating to Together AI, Decagon reduced its inference costs by 6x.

Consulting isn't a one-time transaction; once clients realize they can save money by integrating your processes, they will generate ongoing bills.

The 3rd Most Profitable AI Sector: NVIDIA Enters with Major Investment

Together AI is not the only inference infrastructure company being pursued by capital.

Last week, Pencil News reported that Baseten raised $1.5 billion in funding, reaching a valuation of $13 billion. Baseten also specifically noted that its revenue grew 20-fold over the past year due to increased enterprise demand for "inference."

In May of this year, Fireworks AI was reportedly in talks for a new funding round, with a potential valuation of up to $15 billion.

Capital is chasing them because capital has found that, once large models begin to be commercialized, the real and sustained billing occurs at the inference layer.

This is highly instructive for Chinese companies.

Finance, manufacturing, government services, education, healthcare, customer service, e-commerce, and office software all have significant AI inference demands. These scenarios don’t necessarily require the most powerful models, but they do need controllable, cost-effective, secure, and business-process-integrable model services.

This means opportunities will arise in areas such as inference cloud, model routing, domestic chip adaptation, industry-specific model deployment, AI cost management, agent scheduling, private deployment, and operations and maintenance services.

Finally, let’s look at the investors. NVIDIA appears on the investor lists of both Together AI and Baseten.

It’s no surprise that NVIDIA invests in companies like this—the more reasoning platforms there are, the greater the demand for GPUs. Together AI announced back in 2025, when it raised $305 million, that it planned to deploy NVIDIA Blackwell GPUs at scale. Together AI also expects its computing power and infrastructure to expand by approximately 50 times over the next five years.

In addition to NVIDIA, which sells GPUs, companies like Together AI and Baseten are becoming infrastructure gateways where chipmakers, energy and power equipment providers, manufacturers, and enterprise software firms are collectively investing.

Salesforce also has compelling reasons to enter. The enterprise software giant is primarily concerned with whether AI can be integrated cost-effectively and reliably into sales, customer service, marketing, office, and management workflows. Investing in Together AI is akin to securing early positioning in the “power, water, and gas” infrastructure underlying enterprise AI workflows.

Aramco represents energy capital. AI reasoning may appear to be a software business, but it requires substantial electricity, data center infrastructure, and computational resources. Schneider Electric is a company focused on electrical and data center infrastructure, while Pegatron is a key player in electronics manufacturing and the server supply chain.

The funding rounds for Together AI and Baseten are not just investments in two startups—they are more like an industry signal: competition in AI infrastructure is shifting toward who can handle massive volumes of requests more cheaply and reliably.

This article is from the WeChat public account "Pencil News" (ID: pencilnews), authored by Pencil News, and published with permission from 36Kr.

Together AI Completes $800M Funding, Annual Revenue Reaches $1.15B

01 Find business opportunities in open-source models

02 Call Token, annual revenue of $1.15 billion

The 3rd Most Profitable AI Sector: NVIDIA Enters with Major Investment