DeepMind CEO Proposes 'Einstein Test' as Benchmark for True AGI

Demis Hassabis wants AI to do something no AI has ever done: think like Einstein. Not mimic Einstein. Not regurgitate Einstein’s papers. Actually replicate the kind of creative leap that produced general relativity, starting from scratch with only the information available before 1911.

The CEO of Google DeepMind has been refining what he considers the real benchmark for artificial general intelligence, and it’s far more demanding than anything the industry typically discusses.

The Einstein test, explained

Here’s the setup. You train an AI on all human knowledge up to a specific cutoff date, say 1901 or 1911. Then you ask it to derive something like special relativity (published in 1905) or general relativity (published in 1915). In English: could an AI, given only what scientists knew at the turn of the 20th century, make the same intuitive and creative leaps that Einstein made?

The answer, right now, is a resounding no.

Hassabis has been explicit that even DeepMind’s most impressive achievements don’t clear this bar. The company won a Nobel Prize in Chemistry in 2024 for its work on protein folding through AlphaFold. But in Hassabis’s own framework, solving protein structures doesn’t constitute true AGI because the system operated within a defined problem space with known rules.

Similarly, he considers solving Erdős problems, notoriously difficult open questions in mathematics, insufficient evidence for general intelligence. The distinction Hassabis draws is between solving hard problems within existing paradigms and creating entirely new paradigms.

A shifting timeline

In early 2025, he suggested AGI was “probably three to five years away.” By 2026, he refined that estimate to around 2030, plus or minus one year.

Why this matters beyond the AI bubble

The way Hassabis frames AGI matters because it sets the terms of the debate for the entire industry. If you accept his Einstein test as the standard, then virtually every claim about “achieving AGI” from competitors becomes premature. OpenAI, Anthropic, Meta, and others have all used varying definitions of AGI, some far more permissive than what Hassabis describes.

OpenAI, for instance, has historically tied its AGI definition to economic output, roughly, a system that can do most economically valuable work that humans do. That’s a much lower bar than independently deriving general relativity.

The crypto market doesn’t have a direct connection to these AGI discussions, and Hassabis has made no references to blockchain, tokens, or decentralized protocols in his remarks.