Tether Launches QVAC, a Local AI Platform to Challenge Cloud-Based Models

CoinDesk reports:

Can QVAC build a model strong enough to make users willing to accept a moderate operational barrier for local, self-controlled ownership?

Written by: Liam Akiba Wright

Compiled by: Luffy, Foresight News

Tether’s new project, QVAC, begins with a concept rare among stablecoin companies: it describes its QVAC Psy as a series of foundational large models "rooted in the principles of psychohistory."

The concept of psychohistory originates from Isaac Asimov’s classic science fiction series, Foundation. In the books, the protagonist Hari Seldon uses mathematics, statistics, and social dynamics to predict the behavior of large populations, thereby shortening the dark age that follows the collapse of the Galactic Empire.

The Encyclopedia of Science Fiction defines psychohistory, as depicted by Asimov, as a fictional science; Hari Seldon’s entire plan aims to predict future events and preserve human knowledge and civilization in the event of societal collapse.

Tether's statement is, in fact, a sci-fi-inspired packaging of its corporate mission.

With its reserve assets, liquidity, and distribution channels, Tether has created the largest stablecoin system in the cryptocurrency industry; now, it is replicating this underlying logic in the field of artificial intelligence.

USDT, the stablecoin, forms the primary reserve asset of Tether; meanwhile, computing power, AI models, datasets, and intelligent capabilities that can operate independently of centralized cloud platforms are becoming Tether’s second major reserve asset.

Transitioning from USD reserves to smart asset reserves

Tether's entry into artificial intelligence follows the same operational logic as its core business: converting global offshore USD demand into a reserve asset portfolio primarily composed of short-term sovereign bonds.

According to Tether's Q1 2026 reserve attestation report, the company achieved a net profit of $1.04 billion, with reserve buffer funds totaling $8.23 billion, token-related liabilities of approximately $183 billion, and direct and indirect holdings of U.S. Treasury bills amounting to approximately $141 billion.

A strong reserve base enables Tether to generate consistent revenue, maintain a robust balance sheet, and leverage operating profits to invest in long-term infrastructure initiatives.

CryptoSlate previously analyzed that, due to its massive stablecoin volume, Tether can strategically allocate its reserve assets. In January of this year, Tether invested in 8,888 bitcoins, demonstrating its ability to convert interest income and operational profits into long-term bitcoin holdings. The QVAC project extends this asset allocation logic to the new frontier of artificial intelligence.

In addition to its existing investments in Bitcoin, gold, startups, the energy sector, cryptocurrency mining, and communications infrastructure, Tether has now made a significant bet on artificial intelligence itself. This strategic positioning transforms Tether from a mere issuer of private dollar liquidity into a builder of private digital infrastructure.

The science fiction narrative of "psychohistory" aligns perfectly with this strategic direction, as Tether views artificial intelligence as a civilizational foundational layer rather than just another software sector. QVAC's official materials position itself as an "infinite stable intelligent platform," emphasizing a decentralized intelligent system designed to run locally, aiming to benchmark against and replace centralized AI.

QVAC’s vision states that relying on centralized servers to handle all intelligent interactions is not only slow and unstable, but also carries the risk of censorship and control; QVAC is committed to becoming the edge-layer foundation for users’ dedicated intelligent systems.

This philosophy aligns with Tether’s stablecoin principles: permissionless fund transfers, user control over data, and local, on-device AI execution.

Beneath Asimov’s science fiction concept lies Tether’s more serious assertion: only when artificial intelligence achieves infrastructure-level resilience and risk resistance will its value truly solidify.

Although cloud-based large models offer superior overall capabilities, they come with inherent risks such as platform dependency, pricing volatility, regulatory compliance, network latency, and data routing issues; on-premises AI models, while sacrificing some performance, provide greater ownership, privacy, and consistent reliability.

This trade-off logic aligns closely with the principles of the crypto industry. While self-custody is less convenient than exchange custody, people only truly understand its value after exchange failures occur. Similarly, while local AI is less user-friendly than cloud-hosted models, the advantages of local deployment become evident when network outages, API changes, account bans, or data leakage restrictions arise.

QVAC: An Edge AI Architecture on a Different Path

QVAC’s core differentiation lies in its underlying architecture. Leading large models such as OpenAI, Anthropic, Google DeepMind, and xAI are competing in general capabilities, coding proficiency, multimodal interaction, long-context reasoning, agent applications, and enterprise cloud deployment.

QVAC, on the other hand, chose a completely different path: deployability, privacy protection, low latency, composability, and the ability to exist independently of any single platform.

The QVAC official getting-started documentation defines the project as an open-source, cross-platform ecosystem focused on locally run, peer-to-peer AI applications, compatible with Linux, macOS, Windows, Android, and iOS. Users can execute AI tasks such as large language models, speech recognition, and retrieval-augmented generation (RAG) locally, or delegate inference tasks to other device nodes via built-in P2P functionality.

This means QVAC’s benchmark standards are fundamentally different from those of leading cloud-based AI large models: cutting-edge AI seeks to deliver the strongest general-purpose model capabilities enabled by centralized services; QVAC focuses on where inference occurs, control over execution, whether data remains local to the device, and whether applications can continue operating if centralized services fail.

Tether launched the QVAC software development kit (SDK) in April 2026, providing a unified development suite that enables developers to build, run, and fine-tune AI applications on any device, compatible with all platform systems without requiring code modifications.

The QVAC SDK is built on a unified abstraction layer to support various local inference engines, including the proprietary QVAC Fabric and a forked version of llama.cpp, while integrating speech and translation tools such as whisper.cpp, Parakeet, and Bergamot.

It has long surpassed the scope of a single model release and resembles more of an underlying AI operating system. The open-source AI ecosystem now boasts a wide array of mature components: local inference projects such as Llama, Qwen, Mistral, Gemma, DeepSeek, Hugging Face, llama.cpp, and Ollama are thriving.

QVAC’s core bet is that developers urgently need a comprehensive edge framework that integrates the entire workflow—model loading, inference, speech recognition, OCR, translation, text-to-image generation, retrieval-augmented generation, P2P model distribution, delegated inference, and local fine-tuning—through a unified interface.

QVAC is committed to becoming the foundational layer for intelligent computing distribution, leveraging continuously upgraded mid-tier local models to capture the edge AI ecosystem entry point.

QVAC Fabric is the core of the entire technical architecture. Tether states that Fabric can perform model fine-tuning on mainstream consumer-grade hardware using Vulkan and Metal backends, enabling compatibility with Android devices equipped with Qualcomm Adreno or ARM Mali GPUs, Apple Silicon devices, and Windows and Linux computers with AMD, Intel, or NVIDIA hardware.

Simultaneously employs dynamic tiling to accommodate mobile device memory constraints and supports GPU-accelerated LoRA fine-tuning workflows with masked loss instruction tuning.

If this workflow can be empirically validated by external developers, its value will far exceed that of a typical open-source model release: the model weights are merely the foundational layer, while local personalized fine-tuning and adaptation represent the core incremental value.

MedPsy: QVAC faces its first rigorous test of strength

MedPsy is QVAC's first flagship model product to be deployed. A technical report published on Hugging Face on May 7th revealed that QVAC MedPsy is a healthcare-focused language model designed specifically for edge deployment, available in two versions: 1.7 billion and 4 billion parameters.

The official presents a highly disruptive claim: after rigorous medical-specific training, small models can outperform large medical benchmark models while being compatible with laptops, high-end mobile devices, and even smartphones.

QVAC stated that MedPsy-1.7B achieved an average score of 62.62 across seven closed medical benchmarks, significantly outperforming Google's MedGemma-1.5-4B-it at 51.20, despite having less than half the parameters; MedPsy-4B achieved an average score of 70.54, slightly surpassing MedGemma-27B-text-it's 69.95, while having only one-seventh of its parameters.

On the HealthBench and HealthBench Hard benchmarks, the gap widens further: MedPsy-4B scores 74.00 and 58.00, respectively, while MedGemma-27B-text-it scores only 65.00 and 42.67.

If these benchmark results can be replicated by third parties, they will directly validate QVAC’s core principle: in specific high-value vertical domains, lightweight edge models can challenge massive cloud-based systems.

The training process also reflects QVAC’s competitive strategy: MedPsy uses Qwen 3 as its backbone model, undergoing multi-stage supervised fine-tuning and iterative optimization through reinforcement learning for medical Q&A; during experimentation, over 30 million synthetic data samples were generated, employing a two-stage curriculum training approach, with the Baichuan M3-235B large model selected as the long-text reasoning supervision teacher model.

Its training corpus has not yet been made public, which is a key concern: all currently impressive benchmark scores come from internal evaluations by QVAC, and critical issues such as potential data contamination, coverage scope, prompt construction, and the influence of teacher models still require external validation.

Quantization deployment offers significant advantages; the official team has released GGUF quantized versions compatible with llama.cpp and QVAC SDK. Using Q4_K_M quantization reduces model size by 69% while maintaining a mean loss of less than 1 point. Under the optimal balance of size and performance, the 4-billion-parameter model is only 2.72 GB, and the 1.7-billion-parameter version is just 1.28 GB—easily deployable on local devices.

QVAC also explicitly warns of risks: MedPsy supports only text-based interaction and is limited to English usage; it is not suitable for clinical emergency scenarios, is subject to inherent hallucinations of large models, and requires developers to ensure user privacy and security throughout the entire application architecture.

The healthcare field itself has a strong inherent need for local inference, making MedPsy's prospects promising; however, its true capabilities can only be validated when external researchers reproduce the benchmark scores and test it in real clinical workflows.

Convenience vs. Control: The Ultimate Trade-off in the AI Industry

The debate between local AI and cloud AI is often simplified as a choice between privacy and performance. QVAC redefines this logic, fundamentally representing a trade-off between convenience and autonomous control.

Cloud-based AI excels in ultimate ease of use: users simply open the app, enter a command, and receive results—without needing to worry about complex technical details such as model weights, device VRAM, quantization parameters, vector embeddings, or runtime compatibility. The platform handles all technical complexity. This unparalleled convenience is also the core reason centralized AI platforms have risen so rapidly, enabling users to access cutting-edge intelligent capabilities with minimal barriers.

QVAC requires developers and users to take on greater operational responsibilities in exchange for a new security architecture: local offline operation, offline availability, reduced data leakage, elimination of API dependencies, and seamless peer-to-peer inference and model distribution.

According to the Tether SDK documentation, applications powered by QVAC can operate stably even under weak network conditions, and AI can continue functioning even when offline. The early 2025 QVAC announcement further outlined that AI agents can be deployed directly on local devices, enabling peer-to-peer (P2P) network collaboration between devices; when paired with the WDK suite, these AI agents can autonomously execute Bitcoin and USDT asset transactions.

This is precisely Tether's complete top-level logic: funds, computing power, and agents all follow the same autonomous sovereignty design paradigm.

Of course, its decentralized narrative is not without flaws. By allowing users to download models locally, run them on their own devices, and keep sensitive data on-premises, QVAC achieves a high degree of decentralization at the inference layer, eliminating platform control over every interaction command as seen in hosted APIs. Leveraging the Holepunch network architecture, QVAC also supports decentralized inference delegation and peer-to-peer model distribution, featuring substantively innovative architectural design.

However, governance still exhibits centralized characteristics. QVAC is fully funded, named, and marketed by Tether, with its flagship application, model architecture, SDK roadmap, and the concept of "Stable Intelligence" all led by a single company.

This current state does not conflict with its core value of local-first; rather, it confines the advantages of decentralization to the layer with the most robust evidence-based reasoning. The entire ecosystem still needs to gradually establish a distributed governance mechanism in areas such as default registered nodes, version release channels, security standards, model admission, and long-term community governance.

Reproduction tests determine the final height of QVAC.

Today, QVAC’s credibility depends entirely on third-party reproducibility. If MedPsy’s benchmark scores can be replicated in external evaluation environments, Tether will truly realize the concept of “intelligent asset reserves”: a lightweight, open-source, locally deployable vertical model capable of rivaling cloud-based giant models in highly sensitive domains.

Even if third-party testing narrows or even reverses the performance gap, QVAC’s infrastructure value remains valid, though the narrative around model performance may be weakened. The ultimate industry question still returns to the timeless law of technology: ultimate convenience breeds centralization of power, while autonomous control requires bearing operational costs.

This is precisely the value of Asimov’s science fiction vision: psychohistory in "Foundation" studies the evolutionary patterns of complex large-scale systems under pressure; Tether imbues it with new meaning by focusing on how infrastructure can resist centralized monopolies.

The science fiction narrative is grand in scope, with technology still in its early stages of implementation, but the overall strategic logic is clear and coherent. Tether is leveraging the continuous cash flow from the world’s largest stablecoin to build an AI architecture centered on local execution, peer-to-peer networks, open-source tools, and lightweight edge models, extending the principle of sovereign autonomy from the monetary domain to the realm of intelligence.

The industry no longer questions whether stablecoin giants have the capability to enter the AI space—the answer is obvious.

The real core question is whether QVAC can build a robust enough model and infrastructure to make users willing to accept a moderate operational barrier in exchange for local, self-controlled ownership.

MedPsy is the first measurable threshold. Third-party replication results will determine whether QVAC's psychohistory narrative remains merely a sci-fi metaphor or officially enters the mainstream edge AI赛道 as a foundational architecture with a complete operational logic.