AlphaFold's throne is under threat!
Nature publishes article: Zuckerberg’s BioHub has delivered a major breakthrough, releasing 1.1 billion predicted protein structures—800 million more than the AlphaFold database.
The underlying AI model, ESMFold2, is claimed to outperform AlphaFold3 across all metrics.
More importantly, it is fully open-source and allows commercial use without restrictions.

https://www.nature.com/articles/d41586-026-01686-3
Google DeepMind’s long-standing dominance in protein AI is being challenged by an open-source disruptor.
The landscape of the protein AI赛道 may need to be rewritten.
1.1 billion protein structures, served right on your plate.
On May 27, the biomedical research institute Biohub, founded by the Zuckerberg family, officially launched the ESM Atlas protein structure database.
1.1 billion predicted protein structures, plus 6.8 billion protein sequence entries.
AlphaFold's database has accumulated over 200 million structure predictions, while the ESM Atlas adds another 800 million.
The AI model that generated these predictions is called ESMFold2, developed by a team led by Alex Rives, Scientific Director of Biohub.

Rives says:
This graph illustrates the entirety of protein biology, particularly the most unknown areas.
Why is protein structure prediction important?
Proteins are the core components that drive life; knowing their shape allows us to understand their function, enabling the design of new drugs and the treatment of diseases.
AlphaFold won the Nobel Prize in Chemistry, marking a landmark case of AI transforming science.
Now a new model has emerged with a dataset five times larger.
As an AI model, what are the strengths of ESMFold2?
ESMFold2 took a different technical approach from AlphaFold.
It is built on the "protein language model" released in 2024, drawing inspiration from NLP approaches by treating protein sequences as a "language." Trained on billions of protein data points, the model learns to predict 3D structures directly from sequences.
AI counterparts of AlphaFold should find this familiar—it’s the same logic by which large language models learn human language.
The coverage of training data is a key variable.
ESMFold2 incorporates a large amount of microbial protein data from environments such as soil and oceans, which are absent from AlphaFold's database.
The broader the coverage, the more complete the model's understanding of the "protein world."
The Biohub team states that ESMFold2 outperforms AlphaFold3 in predicting the complex structures of protein-protein interactions.
But the most convincing factor is not benchmark scores, but real-world validation.
The team designed entirely new proteins using ESMFold2, which were synthesized and tested in the lab, with a high proportion of the designs functioning as expected.
From "prediction" to "design" and then to "validation," once this pipeline is completed, value extends from academic papers into the real world.

Fully open-source, that's the biggest advantage.
ESMFold2's most powerful competitive advantage is that it is fully open-source and unrestricted for commercial use.
The strategic significance of this choice becomes clearer when viewed in the context of the entire AI industry.
Although AlphaFold has an open database, AlphaFold3 imposed restrictions on commercial use during its initial release.
Isomorphic Labs, a subsidiary of Google DeepMind, launched its protein interaction prediction model this year as completely proprietary.
Further reading: Google releases 'AlphaFold 4'—no longer open source! Outperforms the previous version significantly.
Ovchinnikov, a computational biologist at MIT, directly highlighted the value of open source: "I expect many people will be excited to try ESMFold2."
The leverage effect of open-source AI has been fully validated in the large language model space, with Meta's Llama series serving as the best example.
A sufficiently powerful open-source model can mobilize the global community to iterate, apply, and discover use cases even the original developers never imagined.
The situation in the protein AI field is even more specific: numerous laboratories and research institutions worldwide urgently need a free, unrestricted structure prediction tool; no matter how powerful proprietary models are, their user base remains limited.
Biohub has chosen full open-source, following the same strategy as Meta in large language models.
Zuckerberg's strategy in the AI field is becoming increasingly clear—use open source for infrastructure and build a moat through ecosystem.

Industry experts—will you buy in?
The academic community has responded positively, but clear reservations remain.
Gemma Atkinson from Lund University in Sweden called the ESM Atlas "a remarkable resource for biology."

Christine Orengo from University College London acknowledges its value but emphasizes that prediction results require independent validation.

A sharper question came from Martin Steinegger of Seoul National University.

He is concerned about how ESMFold2 performs when faced with "novel structures" that differ significantly from known proteins.
His team previously found that the first version of ESMFold did not perform well in this regard. This issue remains unresolved for ESMFold2.
Ovchinnikov from MIT provided the most measured assessment, suggesting that ESM Atlas is better suited as a complement to the AlphaFold database.

He also noted that Isomorphic Labs' proprietary model, along with some open-source models from Biohub that were not directly comparable, achieved similar levels of results.
The lead of ESMFold2 may not be as significant as the paper suggests.
This caution reflects just how intensely competitive the protein AI赛道 has become.
Open-source, closed-source, academic, and commercial models are all iterating at an extremely rapid pace.
Today's "strongest" may be surpassed in six months. This pace is already very similar to the arms race in the large language model space.
When AI Begins to Read the Source Code of Life
In the past, determining the three-dimensional structure of a protein could take months to years of laboratory work.
AlphaFold first demonstrated that AI can accomplish this in minutes.
ESMFold2 has now pushed prediction scale to the billion level, covering a vast number of proteins previously uncharacterized.
Pushing this line of reasoning further, if AI can accurately predict all protein structures and design entirely new functional proteins that are experimentally validated, the deployment of AGI in life sciences may be closer than most people realize.
If ASI truly arrives, biology will no longer be a discipline to be "studied," but a system that can be "engineered."
Design life at the molecular level, customize proteins on demand, and rewrite the rules of evolution.
This sounds like science fiction, but tools like ESMFold2 are gradually turning "science fiction" into "engineering problems."
Today, 1.1 billion protein structures are laid out on the table, freely accessible to any scientist worldwide with an internet connection.
This means AI's understanding of life has taken another step forward.
Reference: https://www.nature.com/articles/d41586-026-01686-3
This article is from the WeChat public account "New Intelligence Yuan," authored by ASI Revelation; edited by Marco.
