Apple Launches Core AI for Apple-Silicon Optimized On-Device Generative AI

2026-06-20 · Source: InfoQ · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Intermediate, quick

Summary

Apple introduced the Core AI framework at WWDC 26, succeeding Core ML to enable on-device large language models and generative AI. This unified architecture supports models from compact 3B-parameter vision models to 70B-parameter LLMs across iPhone, iPad, Mac, and Apple Vision Pro. Core AI, which underpins Apple Intelligence, is now available to developers for "custom intelligence" and requires Apple Silicon, ensuring user data privacy and eliminating cloud costs. Key capabilities include unified hardware access across CPU, GPU, and Neural Engine, a memory-safe Swift API for zero-copy data paths, and ahead-of-time compilation for rapid load times. Developers can convert PyTorch models using Core AI PyTorch, applying critical compression techniques like quantization and palettization for efficient on-device performance.

Key takeaway

For Machine Learning Engineers developing on Apple platforms, Core AI offers a robust path to deploy generative AI and LLMs directly on-device. You can achieve enhanced user privacy and eliminate per-token cloud costs by utilizing Apple Silicon's capabilities. Consider converting your PyTorch models and applying quantization to optimize performance. Be aware that Core AI's long-term ecosystem growth will influence its future value for your projects.

Key insights

Apple's Core AI enables efficient, private on-device generative AI and LLMs across its hardware ecosystem.

Principles

On-device AI prioritizes privacy and cost efficiency.
Unified APIs simplify hardware resource management.
AOT compilation improves model load times.

Method

Convert PyTorch models to Core AI's `AIProgram` using `TorchConverter`, then apply optimization techniques like quantization and palettization for deployment.

In practice

Convert existing PyTorch models for Apple Silicon.
Implement quantization to reduce model footprint.
Utilize `SpecializationOptions` for model caching.

Topics

Core AI Framework
On-device Generative AI
Apple Silicon Optimization
PyTorch Model Conversion
Model Quantization
LLM Deployment

Best for: CTO, AI Architect, Computer Vision Engineer, AI Engineer, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by InfoQ.