Introducing the Third Generation of Apple’s Foundation Models

2026-06-08 · Source: Apple Machine Learning Research · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cloud Computing & IT Infrastructure · Depth: Advanced, long

Summary

Apple introduced its third generation of Apple Foundation Models (AFM) on June 8, 2026, a family of five custom-built models developed with Google. These models power Apple Intelligence, deeply integrated into operating systems with privacy at its core, running on-device or via Private Cloud Compute. The family includes two on-device models: AFM 3 Core (3-billion-parameter dense) and AFM 3 Core Advanced (20-billion-parameter sparse, multimodal, activating 1-4 billion parameters). Server-based models include AFM 3 Cloud (workhorse), ADM 3 Cloud (Image) for generation/editing, and AFM 3 Cloud Pro (most capable), which extends Private Cloud Compute to NVIDIA GPUs in Google Cloud. Evaluations show AFM 3 Core improved text preference to 45.6% and AFM 3 Cloud to 64.7% over predecessors. AFM 3 Core Advanced achieved a 4.15 MOS for expressive voices and 44.7% preference for dictation quality.

Key takeaway

For AI Architects and Machine Learning Engineers designing privacy-centric, high-performance AI systems, Apple's third-generation Foundation Models offer a blueprint. You should consider sparse activation architectures like IFP for on-device scalability, enabling large models on consumer hardware by managing memory efficiently. Additionally, explore secure server-side inference via Private Cloud Compute, potentially extending to hybrid cloud environments with partners like Google and NVIDIA, to balance advanced capabilities with stringent data privacy guarantees.

Key insights

Apple's third-gen foundation models use novel sparse architectures and Private Cloud Compute for enhanced on-device and server AI with privacy.

Principles

Store full models in flash, load experts into DRAM per prompt for on-device scalability.
Implement architectural refinements like PT-MoE for server-side multimodal reasoning.
Train with diverse data, excluding user private data, and respect publisher opt-out.

Method

AFM 3 Core Advanced uses Instruction-Following Pruning (IFP) to store the full model in flash memory (NAND) and selectively load a small subset of "experts" into DRAM per prompt, periodically reselecting them during generation.

In practice

Utilize sparse activation architectures to scale large models beyond traditional DRAM limits on consumer hardware.
Employ Quantization Aware Training to compress models while maintaining high accuracy for target hardware.

Topics

Apple Foundation Models
On-device AI
Private Cloud Compute
Sparse Architectures
Multimodal AI
Responsible AI

Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Scientist, Machine Learning Engineer, AI Architect

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Apple Machine Learning Research.