Apple working to cram massive Gemini model into iPhone to power new Siri

2026-05-28 · Source: AI - Ars Technica · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cloud Computing & IT Infrastructure · Depth: Intermediate, short

Summary

Apple is working to integrate Google's Gemini AI model into Siri for iPhones, a feature initially promised in 2024 and delayed multiple times. This integration, expected later this year, will likely involve a hybrid approach, combining on-device processing with cloud-based AI, despite Apple's historical preference for local AI for privacy. While Apple's Neural Engine optimizes for efficient AI, smartphones generally lack the RAM and processing power for massive models like Gemini, which have trillions of parameters compared to the few billion in on-device models. Apple is distilling large Gemini models for local use and has reportedly partnered with Nvidia for its Confidential Computing platform to handle complex cloud-based Siri requests, addressing privacy concerns by encrypting data during processing on Nvidia GPUs, potentially under Apple's Private Cloud Compute branding.

Key takeaway

For AI Architects evaluating on-device AI strategies, you should recognize that even with optimized silicon, large conversational models like Gemini necessitate a hybrid cloud approach. Your privacy-focused solutions may require confidential computing partnerships, such as Nvidia's, to process complex requests securely off-device. Be aware that this hybrid model, while enabling advanced AI, might introduce noticeable latency for users compared to purely local processing.

Key insights

Integrating large AI models like Gemini into smartphones requires a hybrid on-device and cloud approach due to hardware limitations.

Principles

On-device AI models are significantly smaller, often quantized.
Smartphone GPUs can outperform NPUs for general AI tokens.
Distillation transfers capabilities from large to small models.

Method

Distillation involves training a smaller model to mimic a larger, resource-intensive model, pruning less important weights to transfer useful capabilities.

In practice

Use Gemini Nano for contextual on-device features like summarization.
Implement confidential computing for cloud-based AI to enhance privacy.

Topics

On-device AI
Cloud AI
Gemini Model
Siri Integration
Confidential Computing
AI Model Distillation

Best for: AI Product Manager, Investor, CTO, AI Architect, Director of AI/ML, Tech Journalist

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by AI - Ars Technica.