Odaily Planet Daily reports that SlowMist CISO 23pads posted on X that the PinchBench benchmark evaluated the performance of AI large language models on OpenClaw agent tasks, showing that Gemini 3 Flash achieved the highest success rate of 95.1%. minimax-m2.1 and kimi-k2.5 ranked second and third with success rates of 93.6% and 93.4%, respectively. Claude Sonnet 4.5 scored 92.7%, while GPT-4o achieved 85.2%.
PinchBench Benchmark: Gemini 3 Flash Leads AI Models with a 95.1% Success Rate in OpenClaw Tasks
KuCoinFlashShare






Liquidity and crypto markets witnessed a new benchmark as Gemini 3 Flash achieved a 95.1% success rate on the PinchBench test in OpenClaw tasks, surpassing all others. Minimax-m2.1 and Kimi-k2.5 followed with 93.6% and 93.4%, respectively. Claude Sonnet 4.5 and GPT-4o scored 92.7% and 85.2%. The test evaluated real-world agent performance, and regulators monitoring CFT compliance may use such metrics to enhance transparency.
Source:Show original
Disclaimer: The information on this page may have been obtained from third parties and does not necessarily reflect the views or opinions of KuCoin. This content is provided for general informational purposes only, without any representation or warranty of any kind, nor shall it be construed as financial or investment advice. KuCoin shall not be liable for any errors or omissions, or for any outcomes resulting from the use of this information.
Investments in digital assets can be risky. Please carefully evaluate the risks of a product and your risk tolerance based on your own financial circumstances. For more information, please refer to our Terms of Use and Risk Disclosure.