Google Launches Gemini 3.1 Flash-Lite with One-Eighth the Input Cost of Pro and Outperforms GPT-5 Mini and Claude 4.5 Haiku in Six Benchmarks

iconKuCoinFlash
Share
Share IconShare IconShare IconShare IconShare IconShare IconCopy
AI summary iconSummary

expand icon
On March 4, 2026, Google announced the preview of Gemini 3.1 Flash-Lite as the fastest and most cost-effective model in the Gemini 3 series. Built on the Gemini 3 Pro architecture with a Mixture of Experts design, it offers input pricing at $0.25 per million tokens—one-eighth the cost of the Pro version. In internal benchmarks, Flash-Lite outperformed GPT-5 Mini and Claude 4.5 Haiku across six categories, including GPQA Diamond (86.9%) and LiveCodeBench (72.0%). This on-chain update underscores Google’s latest advancement in the competitive AI landscape, injecting new momentum into crypto news.

BlockBeats news, March 4: Google has released the preview version of Gemini 3.1 Flash-Lite, positioned as the fastest and most cost-effective model in the Gemini 3 series. Built on the Gemini 3 Pro architecture, it employs a Mixture of Experts (MoE) design that activates only a subset of parameters to reduce inference costs. API pricing is set at $0.25 per million input tokens and $1.50 per million output tokens, approximately one-eighth of Gemini 3.1 Pro’s pricing ($2/$18).


In terms of performance, compared to Gemini 2.5 Flash, the first token latency is reduced by 2.5 times, output speed increases by 45%, reaching 363 tokens per second. It supports up to 1 million tokens for input and 64,000 tokens for output, accepting text, images, audio, and video inputs. In 11 internal benchmark tests, Flash-Lite outperforms GPT-5 mini and Claude 4.5 Haiku in 6 of them, achieving 86.9% on GPQA Diamond (doctor-level scientific QA), 76.8% on MMMU-Pro (multimodal reasoning), and 72.0% on LiveCodeBench (code generation).


The model includes an adjustable "thinking level" that allows developers to control the depth of inference in AI Studio and Vertex AI, balancing quality and cost in high-frequency scenarios. Preview access is currently available via the Gemini API (Google AI Studio) and Vertex AI.

Disclaimer: The information on this page may have been obtained from third parties and does not necessarily reflect the views or opinions of KuCoin. This content is provided for general informational purposes only, without any representation or warranty of any kind, nor shall it be construed as financial or investment advice. KuCoin shall not be liable for any errors or omissions, or for any outcomes resulting from the use of this information. Investments in digital assets can be risky. Please carefully evaluate the risks of a product and your risk tolerance based on your own financial circumstances. For more information, please refer to our Terms of Use and Risk Disclosure.