Users of the Google Gemini API report sky-high billing errors

icon MarsBit
Share
Share IconShare IconShare IconShare IconShare IconShare IconCopy
AI summary iconSummary

expand icon
Daily market report: Users of the Google Gemini API are encountering billing issues, with some charged nearly RMB 27,000 within 12 hours. Problems include charges for deleted cache and tasks returning zero output. Two bugs—ghost cache billing and infinite reasoning loops—remain unresolved. Google has not yet provided a fix or refund process. The weekly market report highlights rising concerns about API cost management and transparency, with developers urging the tech giant to issue a clear response.

According to monitoring by Beating, recent reports on Google’s AI developer forum have revealed multiple urgent appeals regarding the uncontrolled billing system of the Gemini API. Several developers, while using the service normally, have faced massive unexpected charges due to underlying system vulnerabilities—for example, one developer was charged nearly RMB 27,000 within just 12 hours. To date, Google’s billing and engineering teams continue to deflect responsibility, with no official statement or expedited refund process issued. Investigations have identified two core bugs responsible for the exorbitant bills: First, the “phantom cache” vulnerability—after context caches created via the API expire or are deleted, and the frontend management interface shows them as cleared, Google’s backend billing system continues charging at rates of thousands of yuan per hour as if the cache were still active. Second, the “infinite reasoning trap”—when tools like web search are enabled, the model’s “thinking budget limit” fails, causing it to enter an infinite loop during simple tasks, consuming up to 64,000 tokens before timing out and crashing; even when the final output is “zero” (no useful response returned), Google still charges full price for thinking, with fees spiking up to 1,500 times normal levels. Due to severe delays of 32 to 72 hours in Google Cloud’s billing system and the absence of automatic spending caps or circuit breakers, developers are hit with massive charges before receiving any alerts. With official customer support deflecting blame and no substantive responses on forums, some affected developers have announced they are completely disabling Gemini’s context caching and reasoning models in production environments to mitigate financial risk.

Disclaimer: The information on this page may have been obtained from third parties and does not necessarily reflect the views or opinions of KuCoin. This content is provided for general informational purposes only, without any representation or warranty of any kind, nor shall it be construed as financial or investment advice. KuCoin shall not be liable for any errors or omissions, or for any outcomes resulting from the use of this information. Investments in digital assets can be risky. Please carefully evaluate the risks of a product and your risk tolerance based on your own financial circumstances. For more information, please refer to our Terms of Use and Risk Disclosure.