Intel Releases Three INT4 Quantized Versions of Alibaba's Wan2.2 Video Models

KuCoinFlash

Release Time: 04/21/2026 08:52:02

Summary

On April 21 (UTC+8), Intel’s lead AI engineer Haihao Shen announced the release of three INT4 quantized versions of Alibaba’s Wan2.2 video model on Hugging Face. The models—T2V-A14B, I2V-A14B, and TI2V-5B—are compressed using Intel’s AutoRound tool to W4A16 precision. INT4 quantization reduces each weight from 2 bytes (BF16) to 0.5 bytes, cutting the weight size to approximately one-fourth of the original. The A14B models employ a MoE architecture with 27B total parameters and 14B active per step, requiring at least 80GB of GPU memory for 720P video on a single card. TI2V-5B is a dense model capable of running 720P@24fps on a 4090. Intel has not yet disclosed the memory usage or quality performance of the quantized models, which require third-party evaluation. These models do not use the main vLLM inference pipeline but instead rely on Intel’s internal vllm-omni branch (feats/ar-w4a16-wan22), which must be installed to run the service. The release underscores the ongoing shift toward computational efficiency, with implications for both Proof of Work (PoW) and Proof of Stake (PoS) systems.

According to ME News, on April 21 (UTC+8), according to monitoring by Beating, Haihao Shen, Intel’s Chief AI Engineer, announced that Intel has uploaded three INT4 quantized versions of Alibaba’s Wan 2.2 video model to Hugging Face: T2V-A14B (text-to-video), I2V-A14B (image-to-video), and TI2V-5B (text-image hybrid-to-video), all quantized to W4A16 using AutoRound. Shen is one of the primary authors of the AutoRound quantization toolkit. INT4 reduces each weight from 2 bytes in BF16 to 0.5 bytes, resulting in a weight size approximately one-quarter of the original. The two A14B models originally used an MoE architecture with 27B total parameters and 14B activated per step; official documentation states that running 720p requires at least 80GB VRAM per GPU. The TI2V-5B is a dense model capable of running 720p@24fps on a 4090 in its original form. Intel has not released benchmark data on post-quantization VRAM usage or visual quality, which will require third-party replication for verification. None of the three models use the mainline vLLM inference pipeline; the README directs users to Intel’s proprietary vllm-omni branch (feats/ar-w4a16-wan22), which must be installed to deploy the services. (Source: BlockBeats)

Source:Show original

Disclaimer: The information on this page may have been obtained from third parties and does not necessarily reflect the views or opinions of KuCoin. This content is provided for general informational purposes only, without any representation or warranty of any kind, nor shall it be construed as financial or investment advice. KuCoin shall not be liable for any errors or omissions, or for any outcomes resulting from the use of this information. Investments in digital assets can be risky. Please carefully evaluate the risks of a product and your risk tolerance based on your own financial circumstances. For more information, please refer to our Terms of Use and Risk Disclosure.