AI vs. Human Predictions in Prediction Markets: Grok Outperforms Humans

Original | Odaily Planet Daily (@OdailyChina)

After most market segments were proven false, prediction markets became one of the few areas still experiencing positive growth within the crypto space. On November 20, Nan Zhi began attempting to identify smart money in prediction markets using a similar approach to the one used last year for finding smart money in meme coins, and...Initial results were quite promising..

At the beginning of December, with the launch of Gemini 3 Pro, while testing related models, I thought about whether AI could be used to analyze and predict the market, and have humans compete against AI to see which side makes more accurate predictions.

When introducing prediction markets, it is often claimed that they drive the market toward "truth" by encouraging "informed individuals" to bet with real money. However, others argue that combining cryptocurrency with prediction markets allows "insiders" to safely profit from information asymmetry, thus pushing the market toward "insider outcomes." This essentially represents a clash between two perspectives: the "wisdom of the crowd" and the idea that "truth lies in the hands of a few." AI-based predictions tend to align more with the "wisdom of the crowd" approach, and therefore require a substantial amount of available knowledge and insights.

Therefore, in the question of how to select AI models, Gemini and Grok were initially chosen because they rely on Google and the X platform, allowing for the most direct access to a vast amount of knowledge and insights. Recently, Nan Zhi added a combination of "Dou Bao + Douyin knowledge," but due to the limited number of predictive questions, this combination is not covered in this article.

Basic Rules

AI Version: Gemini 2.5 Pro (with built-in Google Search), Grok 4 Fast (called via OpenRouter, with native search functionality enabled)
Topic Selection: Humans select the topics to bet on, and AI follows to make predictions, but the Crypto sector is excluded.
Input content: Official title (title), official description (Description), optional answers (actually only Yes and No)

Note: Polymarket categorizes questions into broad categories called "Events" and subcategories called "Markets." The broad "Event" categories include questions like "Who will be the next Federal Reserve Chair?" or "When will Strategy sell Bitcoin?" Each "Event" contains N sub-market questions, such as "Will Hasset become the next Federal Reserve Chair?" or "Will Strategy sell Bitcoin before March 31, 2026?" To align with human prediction formats, we selected "Market" as the subject for AI judgment, without providing additional options. For example, we asked the AI to judge "Will Hasset become the next Federal Reserve Chair?" directly, rather than asking it to choose the most likely candidate from N options.

Prompt Design:
Request AI to search for the latest news, official announcements, and expert analysis reports.
Requirement to exclude and prohibit the use of predictive market data
Make judgments using logical reasoning based on "evidence."
Yes. The instruction clearly specifies that the output should only be "Yes" and "No," and the subsequent explanation is provided to clarify the reasoning behind the answer. The user also requested the translation from Chinese to English, which has been correctly interpreted and followed.

Current result

In the prediction topic, 21 have been settled. Grok has the highest winning rate at 75%, humans at 66.7%, while Gemini has the lowest at 52.4%. The current results are available atRelated websitesView.

What mistake did the AI make?

Gemini occasionally misjudges the current time.

In the question "Will Trump's approval rating hit 35% in 2025?", Gemini indicated that it is currently the first half of 2025 and therefore anything is possible, and gave a haphazard answer.

However, when the author used a program to directly request Gemini to output the current time, Gemini was able to provide the correct answer. It is still unclear why such a time-related error occurred.

Insufficient depth in AI thinking

In the question "Gemini 3.0 Flash released by December 16?", Grok only considered current information, stating that "the official sources have recently only mentioned Gemini 3 Pro and version 2.5, rarely mentioning 3 Flash, so there is insufficient evidence to determine this."

Meanwhile, Gemini pointed out, "Gemini 1.0 was released in December 2023, and the experimental version of Gemini 2.0 Flash was launched in December 2024. Following this pattern, it is logical to release version 3.0 by the end of 2025," and also discovered "a recently leaked (December 14, 2025) demonstration of 'Gemini 3.0 Flash' circulating in online communities, further increasing the likelihood of its imminent public release."

Although the conclusion shows that Gemini's answer is actually incorrect, it is evident in this question that there is a clear difference in the breadth of information they rely on.

AI makes inferences based on common sense rather than on evidence and logic.

In the question "Trump approval Up or Down this week?", Gemini states that "predicting the public opinion approval rate for a single week more than a year ahead is highly uncertain," which again demonstrates a "time misjudgment." Then Gemini claims that "in any given typical week, the probability of events that could cause a slight decline in approval might be slightly higher than the probability of positive events that could significantly boost approval," thus suggesting a higher likelihood of a decline in approval. The conclusion generated is based solely on subjective common-sense assumptions.

In this case, Grok's response, based on news reports and polling data regarding "government shutdowns, economic concerns, controversies over immigration policies, and negative backlash triggered by comments on Robin Williams' death," aligns with its intended design.

The settlement condition judgment is incorrect.

In the question "Will Trump release the Epstein files by December 20?", both Gemini and Grok already know that "the government will release 'hundreds of thousands of pages' of documents on Friday (December 19)." The resolution criteria clearly state that "if the government publicly releases any documents related to Epstein's illegal activities that have not been made public before the listed date, it will be judged as Yes."

However, under this condition, Gemini stated that it was "impossible to complete the disclosure of 'all' documents before December 20," which clearly misjudged the conditions required for settlement and therefore provided an incorrect answer.

Summary

In summary, Grok's predictive accuracy has surpassed that of these "smart money" players who have made hundreds of thousands or even millions of dollars in prediction markets. However, a deeper examination of its predictive logic still reveals many areas that can be guided and improved.