Anthropic Report: AI Self-Improvement Advances, but Full Autonomy Remains Distant

icon MarsBit
Share
Share IconShare IconShare IconShare IconShare IconShare IconCopy
AI summary iconSummary

expand icon
Anthropic’s latest report shows that AI self-improvement is advancing, but full autonomy remains out of reach. As of May 2026, over 80% of Anthropic’s code is now written by Claude, up from single digits in early 2025. The AI can now handle 12-hour tasks, with the latest preview version operating for over 16 hours. However, it still lacks the ability to manage complex processes like self-training. For traders focused on value investing in crypto, this progress highlights the importance of understanding support and resistance levels amid evolving tech landscapes.

According to monitoring by Beating, AI’s ability to autonomously iterate is surpassing all expectations. On June 5, The Anthropic Institute released a report titled “When AI Builds Itself,” detailing its progress in recursive self-improvement. Data shows that, as of May 2026, over 80% of the code merged into Anthropic’s main codebase was written by Claude itself. Prior to the release of Claude Code in February 2025, Claude’s contributions accounted for only single-digit percentages. On May 13, Zhipu AI’s founder Tang Jie predicted that the ultimate endpoint of large models would be self-evolution, and that Claude may have already established a baseline for self-training involving “writing code, cleaning data, and training itself.” However, Anthropic explicitly clarified in the report that fully autonomous recursive self-improvement—designing and developing successors entirely on its own—has not yet been achieved. AI’s role in the development pipeline is currently transitioning from localized efficiency gains to autonomous decision-making. In the second quarter of 2026, Anthropic engineers averaged eight times more code merges per day than in 2024. The current development process is simple: engineers define goals and conduct reviews, while Claude handles coding and execution. Anthropic has also deployed Claude as an automated code reviewer to detect bugs and security vulnerabilities. This confirms that Tang Jie’s concept of “self-evaluation” has been implemented in engineering practice, though human review remains the final safety checkpoint. The reliability of models executing long-duration tasks independently is also doubling. The duration for which models can operate autonomously roughly doubles every four months. In March 2024, Claude 3 Opus could handle simple tasks for only four minutes. By March 2025, Claude 3.7 Sonnet could sustain autonomous work for 1.5 hours. By March 2026, Claude 4.6 Opus could manage complex tasks lasting up to 12 hours. According to data from the evaluation agency METR, the latest Claude Mythos preview version can autonomously operate for over 16 hours, nearing the current limits of evaluation tools. At this rate, by 2027, AI will be capable of independently completing scientific research tasks that would typically require weeks of human effort, enabling companies to transition from “one-person companies” to “no-person companies.” Regarding Tang Jie’s speculation about a “self-training baseline,” the report reveals only a localized “microscopic experimental loop.” In experiments accelerating small-model training code, Claude 4 Opus in May 2025 achieved only a 3x speedup, while the Claude Mythos preview in April 2026 achieved a 52x acceleration. By comparison, top human researchers typically achieve a 4x improvement within 4 to 8 hours. However, the optimization objectives and success metrics in these experiments were pre-defined by humans. When faced with the full end-to-end chain of “cleaning data, generating synthetic data, and self-training,” AI still lacks the decision-making capability to navigate it independently. Nevertheless, the emergence of autonomous closed-loop development is pushing humanity toward the brink of losing ultimate control over the system. Tang Jie’s prediction—that an LLM OS will replace traditional architectures and applications will be generated on-demand—implies that future computer systems will run dynamic code that cannot be pre-audited; and Anthropic’s warning that “human review cannot keep pace with AI’s self-evolution” means we can no longer even monitor the origin of generated code. When AI begins autonomously designing and training its successors, software evolution will become a complete black box. Once AI is allowed to perform un-audited self-iterations within such a black box system, ensuring safety isolation, monitoring, and behavioral alignment of future self-improvement systems will become extraordinarily difficult.

Disclaimer: The information on this page may have been obtained from third parties and does not necessarily reflect the views or opinions of KuCoin. This content is provided for general informational purposes only, without any representation or warranty of any kind, nor shall it be construed as financial or investment advice. KuCoin shall not be liable for any errors or omissions, or for any outcomes resulting from the use of this information. Investments in digital assets can be risky. Please carefully evaluate the risks of a product and your risk tolerance based on your own financial circumstances. For more information, please refer to our Terms of Use and Risk Disclosure.