New Method Estimates GPT-5.5 at 9.7T, Grok-4 at 3.2T

iconKuCoinFlash
Share
Share IconShare IconShare IconShare IconShare IconShare IconCopy
AI summary iconSummary

expand icon
CFT concerns are rising as a new paper estimates GPT-5.5 at 9.7 trillion parameters and Grok-4 at 3.2 trillion. Li Bojie of Pine AI used 1,400 factual questions to assess memory capacity, comparing closed-source models against a curve derived from 89 open-source models. The study reveals that GPT-5.5 is nearly twice the size of second-place Claude Opus 4.7. The methodology highlights potential risks to risk-on assets if larger models enable higher systemic risk. Some models were retrained from scratch, not fine-tuned.

AIMPACT News, April 30 (UTC+8): According to monitoring by Beating, Li Bojie, Chief Scientist at Pine AI, published a paper titled “Incompressible Knowledge Probes: Estimating Parameter Counts of Black-Box Large Language Models via Fact Capacity.” The study reverse-engineered the parameter counts of closed-source models using 1,400 obscure factual questions. Since storing a fact consumes parameter space, the more obscure facts a model answers correctly, the larger its parameter count must be. Li first plotted a highly accurate fitting curve using 89 open-source models with known parameter counts, then mapped the scores of closed-source models onto this curve to estimate their corresponding parameter sizes. The paper evaluated 92 closed-source models; while the numbers are not exact, they provide meaningful ranges—for example, a model estimated at 9.7T may actually range between 3T and 29T. However, the relative rankings and scale remain highly informative: GPT-5.5 is estimated at ~9.7T, leading by a wide margin—nearly double the second-place Claude Opus 4.6 (~5.3T). The second tier (3T–4T) is densely packed: GPT-5 at ~4.1T, Claude Opus 4.7 at ~4.0T, o1 at ~3.5T, Grok-4 at ~3.2T, and o3 at ~3.0T. OpenAI, Anthropic, and xAI’s flagship models all fall within a 1.4x range of each other. The third tier (1T–2T) includes mid-tier flagships: GPT-4.1 at ~2.2T, Claude Sonnet 4.6 at ~1.7T, and Gemini 2.5 Pro at ~1.2T. At the bottom end, smaller models range from GPT-4o at ~720B down to Claude Haiku 4.5 at ~65B. The base GPT-5 model itself is estimated at ~4.1T, but subsequent .x versions (5.1 to 5.4) show reduced fact storage capacity of only 1.0T–1.5T—until GPT-5.5 jumps to ~9.7T, achieving a true breakthrough. The paper includes a clever validation method: comparing whether two models make the same mistakes on obscure questions. GPT-5’s incremental .x upgrades each produced distinct error patterns (similarity scores all below 0.08), indicating each version was trained from scratch rather than fine-tuned from prior weights. Claude Opus’s parameter count grew from 1.4T in version 4 to 4.0T in version 4.7—but not through continuous fine-tuning: errors between versions 4 and 4.1 were nearly identical (confirming fine-tuning from the same base), while errors between versions 4.6 and 4.7 showed zero overlap (similarity dropped to 0), proving the latest flagship was also trained from scratch. For MoE (Mixture of Experts) models, total parameters—not those activated during inference—best predict knowledge capacity. The study also found that models of the same size, whether from this year or two years ago, retain roughly the same amount of obscure factual knowledge; reasoning ability can improve over time, but factual storage capacity cannot be compressed. The evaluation toolkit and all data have been open-sourced. (Source: BlockBeats)

Disclaimer: The information on this page may have been obtained from third parties and does not necessarily reflect the views or opinions of KuCoin. This content is provided for general informational purposes only, without any representation or warranty of any kind, nor shall it be construed as financial or investment advice. KuCoin shall not be liable for any errors or omissions, or for any outcomes resulting from the use of this information. Investments in digital assets can be risky. Please carefully evaluate the risks of a product and your risk tolerance based on your own financial circumstances. For more information, please refer to our Terms of Use and Risk Disclosure.