According to 1M AI News, OpenClaw founder Peter Steinberger shared benchmark results from the third-party agency PinchBench, which evaluated the performance of AI large language models on OpenClaw agent tasks.
The results show that Gemini 3 Flash leads with a 95.1% success rate on the OpenClaw task, followed by minimax-m2.1 and kimi-k2.5 at 93.6% and 93.4% respectively. Claude Sonnet 4.5 achieves 92.7%, while GPT-4o reaches 85.2%.
