Chinese Enthusiast Runs 1 Trillion-Parameter Kimi K2.5 on RTX 3060 with 768GB Intel Optane Memory

iconCryptoBriefing
Share
Share IconShare IconShare IconShare IconShare IconShare IconCopy
AI summary iconSummary

expand icon
A Chinese AI enthusiast named APFrisco showcased Moonshot AI’s Kimi K2.5, a 1 trillion-parameter MoE model, running on a single RTX 3060 GPU with 768GB of Intel Optane memory. The setup processed about four tokens per second. This AI + crypto news highlights a major on-chain news breakthrough, as the model size exceeds 630GB. The hardware is mid-tier, making the performance especially notable.

A trillion-parameter AI model just ran on a graphics card that most gamers would consider mid-range.

A Chinese AI enthusiast known as APFrisco demonstrated Moonshot AI’s Kimi K2.5 model, a Mixture-of-Experts (MoE) large language model with 1 trillion total parameters, running on a single Nvidia RTX 3060 GPU paired with 768 GB of Intel Optane Persistent Memory. The setup achieved roughly four tokens per second, which is slow by production standards but remarkable given the hardware involved.

How a mid-tier GPU handles a trillion parameters

Kimi K2.5 doesn’t actually fire up all 1 trillion parameters at once. For each token generated, only 32 billion parameters are activated. The rest sit idle, waiting their turn.

Advertisement

Even with that efficiency trick, the model is enormous. The full Kimi K2.5 weighs in at approximately 630 GB. Quantized versions, which compress the model’s precision to reduce memory requirements, still clock in around 381 GB. That’s why APFrisco needed 768 GB of Intel Optane Persistent Memory: no standard consumer RAM setup comes close to handling that kind of footprint.

Optane PMem DIMMs are an interesting choice. Intel discontinued its Optane line, which means these modules are now essentially legacy hardware floating around the second-hand market. They’re slower than traditional DRAM but vastly cheaper per gigabyte, making them an unconventional but surprisingly practical solution for loading massive models that would otherwise require enterprise-grade infrastructure.

The RTX 3060 launched in early 2021 with 12 GB of VRAM. It was designed for 1080p gaming and light creative workloads, not running frontier AI models.

What typical Kimi K2.5 deployments look like

High-performance inference for Kimi K2.5 typically targets configurations with up to 8 high-end GPUs. Those setups deliver speeds between 10 and 300-plus tokens per second.

The demonstration was shared on Reddit’s r/LocalLLaMA community and subsequently covered by Tom’s Hardware.

Kimi K2.5 itself was released on January 27, 2026, by Moonshot AI. It features multimodal capabilities and was trained on roughly 15 trillion mixed visual and text tokens. It’s an open-weight model, meaning anyone can download and run it, which is precisely what made APFrisco’s experiment possible in the first place.

Disclaimer: The information on this page may have been obtained from third parties and does not necessarily reflect the views or opinions of KuCoin. This content is provided for general informational purposes only, without any representation or warranty of any kind, nor shall it be construed as financial or investment advice. KuCoin shall not be liable for any errors or omissions, or for any outcomes resulting from the use of this information. Investments in digital assets can be risky. Please carefully evaluate the risks of a product and your risk tolerance based on your own financial circumstances. For more information, please refer to our Terms of Use and Risk Disclosure.