AI Agents Pass Just 2.6% of Real-World Tasks in New Benchmark

iconCryptoBriefing
Share
AI summary iconSummary

A new benchmark from UC Berkeley suggests that AI agent timelines need a serious reality check.

The Agents’ Last Exam, a large-scale evaluation framework built with input from over 250 industry experts across more than 100 institutions, found that mainstream AI agents achieve an average full pass rate of just 2.6% on its hardest tier of real-world professional tasks. The best-performing agent, Codex running on gpt-5-5, managed roughly 26%.

Advertisement

What the benchmark actually tests

The benchmark covers 55 non-physical sub-industries organized into 13 clusters, derived from the O*NET/SOC 2018 taxonomy. So far, the team has cataloged more than 1,500 tasks, with an ambitious goal of reaching 5,000. Each task produces verifiable outcomes, meaning there’s no room for the kind of fluent-sounding-but-wrong outputs that large language models have become famous for.

The paper was submitted to arXiv on June 3, 2026, and the project lives at agents-last-exam.org. It’s designed as a living benchmark that will continue expanding in scope and complexity over time.

The collaboration behind it

The initiative was spearheaded by UC Berkeley’s RDI and drew collaborative input from institutions including MIT, Harvard, Stanford, Goldman Sachs, JPMorgan, Meta, Amazon, Adobe, and Snorkel AI.

Why a 26% top score matters

That 26% figure represents the overall pass rate for the best-performing configuration, Codex on gpt-5-5. The average across popular configurations of mainstream agents sits at 2.6% on the hardest tier. Cursor and Claude-based setups followed Codex in the rankings.

The benchmark specifically evaluates long-term task performance rather than quick-hit question answering. An AI agent might be able to answer a finance question correctly in isolation but completely fall apart when asked to execute a multi-step workflow that requires maintaining context, making sequential decisions, and producing a verified deliverable.

Disclaimer: The information on this page may have been obtained from third parties and does not necessarily reflect the views or opinions of KuCoin. This content is provided for general informational purposes only, without any representation or warranty of any kind, nor shall it be construed as financial or investment advice. KuCoin shall not be liable for any errors or omissions, or for any outcomes resulting from the use of this information. Investments in digital assets can be risky. Please carefully evaluate the risks of a product and your risk tolerance based on your own financial circumstances. For more information, please refer to our Terms of Use and Risk Disclosure.