Introducing the Third Generation of Apple’s Foundation Models

· Source: Apple Machine Learning Research · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cloud Computing & IT Infrastructure · Depth: Advanced, long

Summary

Apple introduced its third generation of Apple Foundation Models (AFM) on June 8, 2026, a family of five custom-built models developed with Google. These models power Apple Intelligence, deeply integrated into operating systems with privacy at its core, running on-device or via Private Cloud Compute. The family includes two on-device models: AFM 3 Core (3-billion-parameter dense) and AFM 3 Core Advanced (20-billion-parameter sparse, multimodal, activating 1-4 billion parameters). Server-based models include AFM 3 Cloud (workhorse), ADM 3 Cloud (Image) for generation/editing, and AFM 3 Cloud Pro (most capable), which extends Private Cloud Compute to NVIDIA GPUs in Google Cloud. Evaluations show AFM 3 Core improved text preference to 45.6% and AFM 3 Cloud to 64.7% over predecessors. AFM 3 Core Advanced achieved a 4.15 MOS for expressive voices and 44.7% preference for dictation quality.

Key takeaway

For AI Architects and Machine Learning Engineers designing privacy-centric, high-performance AI systems, Apple's third-generation Foundation Models offer a blueprint. You should consider sparse activation architectures like IFP for on-device scalability, enabling large models on consumer hardware by managing memory efficiently. Additionally, explore secure server-side inference via Private Cloud Compute, potentially extending to hybrid cloud environments with partners like Google and NVIDIA, to balance advanced capabilities with stringent data privacy guarantees.

Key insights

Apple's third-gen foundation models use novel sparse architectures and Private Cloud Compute for enhanced on-device and server AI with privacy.

Principles

Method

AFM 3 Core Advanced uses Instruction-Following Pruning (IFP) to store the full model in flash memory (NAND) and selectively load a small subset of "experts" into DRAM per prompt, periodically reselecting them during generation.

In practice

Topics

Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Scientist, Machine Learning Engineer, AI Architect

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Apple Machine Learning Research.