Apple unveiled a brand-new Core AI framework at WWDC, replacing Core ML after nine years of service, and rewritten from the ground up for the large model era.

Article author and source: AI New Era

Cook's final WWDC turned Apple's AI foundation upside down.

Apple has discontinued Core ML after nine years of service, replacing it with Core AI, which was rewritten from the ground up for large models.

Apple drew the same line for all AI.

The all-new Core AI is Apple’s on-device AI inference framework designed for the large model era.

It uniformly orchestrates the CPU, GPU, and Neural Engine, natively supporting core LLM capabilities such as autoregressive generation, streaming responses, and multi-turn conversations, across all platforms from iOS 17 to watchOS 17.

In simple terms, Core ML is for traditional machine learning, while Core AI is for large models.

Meanwhile, the accompanying toolchain has been completely rebuilt.

The new .aimodel format, combined with Apple’s open-source CoreAI-Torch conversion toolkit and Xcode’s performance optimization and ahead-of-time compilation features, covers the entire workflow from model conversion to deployment and listing.

For example, a language learning app.

A student points their phone at a hummingbird; SAM3 simultaneously performs two tasks on the device: identifying the object in the frame as a "Hummingbird" and precisely segmenting the hummingbird from the background to generate a clean card image.

Subsequently, a Qwen model with 0.6 billion parameters takes over the text processing, using the recognized results to generate a structured flashcard with three fields—Chinese word, English definition, and example sentence—each properly assigned, returning a native Swift type rather than a raw text block requiring further parsing.

The entire process runs locally on the phone without internet access or API calls.

Behind this is Apple’s official coreai-models repository, which hosts pre-optimized open-source models such as Qwen, Mistral, and SAM3, all adapted for Apple Silicon. Developers can pull them down and run them in their own apps with just a few lines of Swift code.

If you don’t want to use the pre-built one, you can also deploy your own model.

Apple has synchronized the coreai-torch toolkit as open source on GitHub, enabling users to convert a PyTorch model into .aimodel format with just five lines of Python and compile and deploy it in Xcode.

Project address:

https://github.com/apple/coreai-models

https://github.com/apple/coreai-torch

However, Apple wants more than just running models—it wants to unify all models.

On the technical side, it’s the new Language Model protocol introduced in the Foundation Models framework. It defines a unified Swift API, allowing any model that adheres to it to be invoked using the same codebase.

Apple's on-device models follow this protocol, open-source models running on Core AI follow this protocol, and cloud-based large models like Claude and Gemini also follow this protocol.

One codebase, three models—seamlessly switch from local to cloud. Apple has turned itself into the routing layer for AI.

A 20-billion-parameter model is hidden in the phone's flash memory.

Running behind the Foundation Models framework is AFM 3, the third-generation proprietary model family jointly developed by Apple and Google, with five models released at once.

Two on the edge:

1. AFM 3 Core is a dense model with 3 billion parameters, designed for everyday lightweight tasks;

2. AFM 3 Core Advanced is a 20B-parameter sparse model, representing the upper limit for on-device performance on Apple platforms.

Three in the cloud:

1. AFM 3 Cloud is the primary server;

2. ADM 3 Cloud specializes in image generation and editing (the technology behind Image Playground);

3. AFM 3 Cloud Pro is the most powerful in the entire family.

Among them, the flagship device-side solution is the AFM 3 Core Advanced, a 20-billion-parameter large model that runs directly on smartphones.

Logically, a smartphone’s memory simply can’t accommodate this scale. Traditional large models require all weights to fit into DRAM, making even desktop-level devices struggle with 20 billion parameters.

Apple's solution for this is called Instruction-Following Pruning.

The full model is stored in flash memory (NAND). Upon receiving a request, a lightweight routing module first selects which experts to activate, then loads only those corresponding weights into DRAM. The number of parameters actually activated each time ranges from one to four billion, depending on task complexity.

A model with 20 billion parameters uses only 5% to 20% of its capacity during operation, while the rest remains quietly stored in flash memory, waiting to be summoned.

On the cloud side, it's Apple's most powerful server model—AFM 3 Cloud Pro.

To tackle complex reasoning and Agent tool invocation, Apple, in collaboration with Google and NVIDIA, has extended Private Cloud Compute to NVIDIA GPUs on Google Cloud. Privacy rules remain unchanged—data stays within its boundary.

Real-world test results also validate the effectiveness of this architecture.

AFM 3 Core was deemed superior on 45.6% of test prompts, compared to just 23.3% for the previous generation. The gap is even more pronounced with AFM 3 Cloud in the cloud, at 64.7% versus 8.7%, nearly a one-sided dominance.

We've covered the architecture and benchmarks—now comes the question developers care about most: how much does this cost?

If your app receives fewer than 2 million downloads on the App Store, cloud inference on Private Cloud Compute is completely free—with zero API costs and zero token fees. Just focus on building your app.

This threshold precisely targets independent developers and small to medium-sized teams.

Three lines of code, Claude is at the table.

Among publicly released third-party integrations, Anthropic was the first to deliver.

On the same day the WWDC keynote ended on June 8, Anthropic immediately released a Swift package, officially integrating with the Foundation Models framework, available starting June 9.

The idea is simple.

Apple's on-device models excel at lightweight tasks such as summarization, information extraction, and classification—offering fast performance, offline operation, and zero cost. However, they struggle with complex demands like multi-step reasoning, code generation, and web search.

And Claude's Swift package precisely fits at this seam.

Developers can normally invoke Apple's on-device models within the Foundation Models framework; when a task exceeds the on-device capabilities, the framework automatically routes the request to Claude, and the response is returned as a stream to the same SwiftUI view.

Users perceive no transition at all—it’s just one app to them.

In other words, if a note-taking or learning app you regularly use suddenly becomes smarter and can perform cross-document semantic analysis, it’s likely because the developers have integrated this package.

For example, a journaling app can use an on-device model to generate daily writing prompts, but when a user asks, “What are the common themes in my journals from the past few months?”, this cross-temporal semantic analysis is automatically handled by Claude.

However, Anthropic's presence in the Apple ecosystem extends beyond this step.

The Claude Agent was integrated into Xcode 26.3 as early as February this year, helping developers write code, run tests, and automate tasks.

However, Claude in Xcode is designed for developers themselves, while Claude in Foundation Models is intended for the end users of apps.

For Anthropic, this is a long-overdue but crucial consumer distribution entry ticket.

Claude has been aggressively targeting developers and enterprise markets but has almost no presence among general consumers.

This time, Apple's Foundation Models framework has given it a pathway to reach a billion users.

2.5 billion devices, one arena

Looking back at all of Cook's actions at his final WWDC, one thread runs throughout.

Apple doesn't want to be an AI model company. It entrusted Siri's core to Google, handed the runtime for open-source models to Core AI, and gave users the choice of third-party AI options.

What it needs to do is the arena.

2.5 billion devices, a unified Language Model protocol, and a comprehensive orchestration framework from edge to cloud.

The better the model, the more it can reach the world’s largest high-value user base through this platform.

The competition among AI giants has gained a brand-new dimension starting today.

Previously, Anthropic and OpenAI competed over API call volumes, developer tools, and enterprise contracts.

Now Apple has moved the battlefield into everyone’s pocket— whoever secures the position of the default AI engine will win the next round.

On September 1, John Ternus assumed the role of Apple’s CEO, inheriting not just a hardware company, but also the AI arena left behind by Cook.

Apple Unveils Core AI Framework at WWDC, Replacing Core ML After 9 Years

Apple drew the same line for all AI.

A 20-billion-parameter model is hidden in the phone's flash memory.

Three lines of code, Claude is at the table.

2.5 billion devices, one arena