Tether Launches TurboQuant to Enable Larger AI Models on Devices

Tether’s TurboQuant cuts AI memory use by up to 5x, helping devices handle longer tasks locally.
QVAC 0.12.0 lets developers run larger AI workloads on laptops and phones with less memory strain.
TurboQuant tackles AI’s memory bottleneck, enabling longer chats, larger files, and bigger code projects.

Tether has added a new memory optimization tool to QVAC SDK 0.12.0, a move that could help laptops, smartphones, and other devices handle larger workloads locally. Announcing the update on X, CEO Paolo Ardoino said the release includes TurboQuant, a technology that reduces AI memory requirements by up to five times while maintaining nearly the same output quality.

The update focuses on a key limitation for large language models: memory. As conversations and tasks become longer, memory demands increase sharply. TurboQuant reduces that burden, allowing devices to work with larger documents, longer conversations, and more information at once.

🚨🤖Tether AI ships TurboQuant KV-Cache Quantization within QVAC SDK 0.12.0, compressing the KV cache memory requirements by up to 5x, near-lossless.

Effective high-quality local AI is one step closer! https://t.co/wZjXgR0Bu5
— Paolo Ardoino 🤖 (@paoloardoino) June 1, 2026

The release also adds text-to-video generation, robot control features, coding assistant support, voice processing upgrades, and faster image classification tools.

TurboQuant Targets AI’s Memory Bottleneck

TurboQuant sits at the center of the QVAC SDK 0.12.0 release. The technology compresses the KV cache, a type of working memory that AI models use to keep track of conversations, documents, and other information during a session.

Memory demands rise as users feed more information into a model. Tether said a 4-billion-parameter model processing about 262,000 tokens can require roughly 8 GB of memory for cache alone. Running several sessions at that scale can quickly exceed the limits of many laptops and consumer devices.

TurboQuant aims to reduce that pressure. According to Tether, the technology can shrink KV cache memory requirements by up to five times while preserving nearly the same output quality. As a result, users can work with longer conversations, larger documents, and bigger codebases without relying as heavily on remote computing resources.

QVAC Expands Beyond Language Models

The update includes more than memory improvements. QVAC SDK 0.12.0 adds several new tools aimed at expanding what developers can run on local devices.

Among the additions is support for text-to-video generation through the Wan2.1 model. The platform also introduces a vision-language-action feature that allows developers to build applications for robotic control.

The release further adds a lightweight image classification tool designed for tasks that do not require larger vision models. At the same time, QVAC moved its text-to-speech and transcription systems to its GGML engine, a change that broadens support across major desktop and mobile operating systems.

Developers also gained new options for coding assistants. QVAC now integrates with OpenCode and OpenClaw through a provider package that simplifies model management and deployment.

Open-Source AI Moves Closer to the Edge

The release shows Tether’s focus on running more computing tasks directly on users’ devices rather than relying entirely on centralized data centers. The company has increasingly focused on software that can operate across personal devices, local networks, and decentralized systems.

“Google’s research showed that AI memory could be compressed far more efficiently than most people assumed. Our work brings that breakthrough into production software that developers, startups, and users can actually build with,” said Ardoino.

He added, “People should be able to ask an AI assistant to read a long document, remember a project, help with code, or work through private information without every task being forced through a remote data center.”

The launch comes as Tether expands its efforts beyond memory optimization tools. Ardoino recently disclosed that the company is developing an open-source peer-to-peer search engine and shared a demonstration of a decentralized Wikipedia search system.

Disclaimer: The information presented in this article is for informational and educational purposes only. The article does not constitute financial advice or advice of any kind. Coin Edition is not responsible for any losses incurred as a result of the utilization of content, products, or services mentioned. Readers are advised to exercise caution before taking any action related to the company.