BlockBeats news, March 4: Google has released the preview version of Gemini 3.1 Flash-Lite, positioned as the fastest and most cost-effective model in the Gemini 3 series. Built on the Gemini 3 Pro architecture, it employs a Mixture of Experts (MoE) design that activates only a subset of parameters to reduce inference costs. API pricing is set at $0.25 per million input tokens and $1.50 per million output tokens, approximately one-eighth of Gemini 3.1 Pro’s pricing ($2/$18).
In terms of performance, compared to Gemini 2.5 Flash, the first token latency is reduced by 2.5 times, output speed increases by 45%, reaching 363 tokens per second. It supports up to 1 million tokens for input and 64,000 tokens for output, accepting text, images, audio, and video inputs. In 11 internal benchmark tests, Flash-Lite outperforms GPT-5 mini and Claude 4.5 Haiku in 6 of them, achieving 86.9% on GPQA Diamond (doctor-level scientific QA), 76.8% on MMMU-Pro (multimodal reasoning), and 72.0% on LiveCodeBench (code generation).
The model includes an adjustable "thinking level" that allows developers to control the depth of inference in AI Studio and Vertex AI, balancing quality and cost in high-frequency scenarios. Preview access is currently available via the Gemini API (Google AI Studio) and Vertex AI.
