Foreign media report that AI programming tools have shifted from an "optional" to a "default" in development teams, but optimistic expectations about efficiency gains are being tempered by increased costs and quality issues. Multiple studies and corporate case studies show that while AI can indeed accelerate code writing, it does not necessarily reduce subsequent rework.
Developers are no longer willing to work without AI.
In February this year, the AI research lab METR disclosed that researchers originally intended to replicate an experiment on programming efficiency, comparing the differences between developers writing code manually versus using AI to complete tasks, but encountered resistance during implementation: many developers were unwilling to even temporarily give up their AI tools for the sake of the experiment.
METR previously conducted related tests in 2025. Participants generally felt more efficient, but actual measurements showed the opposite: although code was generated faster, developers spent more time waiting for model outputs, correcting errors, and repeatedly guiding the tool to complete tasks.
Due to the difficulty of keeping developers engaged without AI, METR later shifted to distributing a survey allowing technical staff to self-assess the benefits brought by AI. Respondents generally believed that AI had doubled the value of their work.
Enterprises are reassessing their AI investments.
The article notes that these "feelings of greater efficiency" are now being tested by corporate spending and actual output. Since 2026, Silicon Valley once popularized using token consumption as a measure of AI usage intensity, even treating it as a proxy for productivity—but this approach has now clearly backfired.
This week, the Financial Times reported that Amazon has shut down its internal token leaderboard, Kirorank, after employees manipulated the system by overusing AI agents to inflate rankings, increasing costs without corresponding improvements in output.
The Information reported that Uber exhausted its entire annual AI budget within the first four months of 2026. Recently, Chief Operating Officer Andrew Macdonald stated on a podcast that such spending has not yet resulted in measurable project growth or productivity gains.
Writing code faster doesn't mean less maintenance.
The article argues that the bigger issue is code maintenance. Programmer and author James Shore recently pointed out in a widely shared blog post that if coding speed doubles but maintenance costs do not decrease accordingly, the team is merely trading short-term speed for long-term burden.
Around this point, a number of data points have emerged in the market. Aiswarya Sankar, founder of the reliability engineering startup Entelligence AI, stated that approximately 44% of enterprise token consumption is used to fix defects generated by AI. Code Rabbit, a code review tool company, also reported that its analysis of pull requests in open-source projects showed that issues introduced by AI-generated code are 1.7 times more frequent than those from human-written code.
Although this data comes from relevant service providers and may be influenced by obvious biases, independent research has provided similar warnings. In April, researchers at the Singapore Management University released a report stating that AI-generated code could lead to long-term maintenance costs for real software projects.
Researchers recommend managing AI as a "junior developer."
Regarding how to address this, the article mentions that some AI programming agent vendors advocate using more AI to fix issues generated by AI. Scott Wu, founder of Cognition, the developer of the AI programming agent Devin, holds this view.
However, he acknowledged that while Devin can independently complete certain tasks, its current capabilities still generally fall between those of a junior and intermediate programmer, depending on the task type. This means the development team cannot yet fully delegate work and walk away.
In contrast, researchers from the Singapore Management University recommend a more human-centric approach: developers must clearly understand the boundaries of what AI is good at and not good at, establish quality assurance processes for AI outputs, and review model-generated results as if auditing the code of junior engineers.
The article concludes that human developers remain the primary decision-makers in high-level tasks such as software architecture and security design—a point even practitioners supporting AI agents largely agree with.
