White House and Anthropic Discuss AI Model Safety Evaluation Framework

CoinDesk reports:

The White House is discussing a model safety evaluation framework with AI company Anthropic, aiming to establish a unified classification system for safety vulnerabilities in next-generation AI models and determine whether government intervention is necessary. This follows the U.S. government’s previous imposition of export restrictions on Anthropic’s latest model due to a security issue classified as a “jailbreak.”

Establish a unified standard around the "jailbreak" vulnerability

According to reports, this framework will be used to assess the severity of future similar incidents, focusing on three key aspects: the extent to which protective measures were bypassed, the model capabilities that were exposed, and the real-world consequences of the vulnerability.

Currently, there remains a clear divide between governments and corporations on such issues. Previously, Dario Amodei, CEO of Anthropic, and government officials differed in their assessments of whether the related vulnerabilities constituted significant security concerns. The report notes that AI advancements are outpacing the government’s existing frameworks, which are insufficient to reach consistent evaluations on such disputes.

Negotiations continue to advance after export restrictions were imposed.

The White House previously imposed export restrictions on Anthropic, prohibiting overseas users from accessing its latest models, Fable 5 and Mythos 5. The company subsequently suspended external access to these two models.

The report stated that negotiations between the two sides nearly broke down last Friday, as Anthropic refused to take down Fable as requested by the government, arguing that the related vulnerabilities had limited impact and did not constitute a serious security flaw. Subsequently, the White House imposed export restrictions, forcing the company to remove the relevant model from the market.

However, communication between the two sides resumed over the weekend. U.S. Secretary of Commerce Howard Lutnick, National Cyber Director Sean Cairncross, and Anthropic co-founder Tom Brown participated in multiple extended phone calls. Following this, both sides held nearly a week of in-person talks in Washington.

The White House accelerates the implementation of AI safety regulations.

Reports indicate that Anthropic’s representatives involved in the negotiations included Head of Public Policy Sarah Heck and co-founder Tom Brown. The company also sent senior researchers and security experts to the U.S. Department of Commerce on Monday to continue discussions with government officials.

This round of discussions also reflects a more realistic assessment: no AI model can be completely immune to cyberattacks. Therefore, the government aims to first establish clear standards for companies to assess security risks before deciding when restrictions are necessary.

This direction also echoes discussions at the recent G7 summit, where leading AI companies and several national leaders emphasized the need to quickly establish clearer standards for measuring model safety to address economic and national security risks posed by increasingly capable AI systems.