According to 1M AI News monitoring, Google Research has released the quantization compression algorithm TurboQuant, which can compress the KV cache of large language models to 3 bits, reducing memory usage by at least 6x without requiring training or fine-tuning and without sacrificing model accuracy. In 4-bit mode, attention computation speed on NVIDIA H100 GPUs increases by up to 8x compared to the 32-bit unquantized baseline.
The research team validated TurboQuant on long-context benchmarks including LongBench, Needle In A Haystack, and ZeroSCROLLS using Gemma and Mistral models, achieving optimal performance across all tests. The algorithm consists of two sub-algorithms: PolarQuant eliminates the memory overhead of traditional quantization methods through polar coordinate transformation, and QJL corrects residual errors using only 1 bit.
The research, led by Google Research’s Amir Zandieh and Vice President and Google Fellow Vahab Mirrokni, in collaboration with KAIST in South Korea and New York University, will be presented at ICLR 2026. Google states that one of the primary applications of this technology is addressing the KV cache bottleneck in models like Gemini.
Google Research introduces TurboQuant: 3-bit quantization with no accuracy loss, speeding up inference by up to 8x
KuCoinFlashShare






Google Research has unveiled TurboQuant, a 3-bit quantization method that reduces KV cache memory usage by 6x without accuracy loss. On NVIDIA H100 GPUs, 4-bit attention computations run up to 8x faster than 32-bit models. Tested on Gemma and Mistral using LongBench, Needle In A Haystack, and ZeroSCROLLS, the method achieved top results. Developed by Amir Zandieh and Vahab Mirrokni in collaboration with KAIST and NYU, the paper will be presented at ICLR 2026. On-chain data indicates rising interest in altcoins to watch, as efficiency gains drive adoption.
Source:Show original
Disclaimer: The information on this page may have been obtained from third parties and does not necessarily reflect the views or opinions of KuCoin. This content is provided for general informational purposes only, without any representation or warranty of any kind, nor shall it be construed as financial or investment advice. KuCoin shall not be liable for any errors or omissions, or for any outcomes resulting from the use of this information.
Investments in digital assets can be risky. Please carefully evaluate the risks of a product and your risk tolerance based on your own financial circumstances. For more information, please refer to our Terms of Use and Risk Disclosure.