Why Qualcomm Beats Apple in Chips (Explained in 6 Mins)
Summary
Qualcomm's Snapdragon chip significantly outperforms Apple's M5 in AI benchmarks, scoring over 88,000 points compared to Apple's 57,000, a 54% difference. This advantage stems from Qualcomm's Hexagon NPU, which uses a heterogeneous computing architecture with specialized scalar, vector, and matrix processors for efficient task allocation. The Snapdragon also features a Sensing Hub for ultra-low power background tasks, preserving main NPU power, and employs tile-based computational workload processing to reduce external RAM access. While Apple focuses on perceived speed and unified memory with its Metal for Tensor Ops, achieving faster "time to first token," Qualcomm's high scores are partly due to INT4 quantization, which may impact precision. Additionally, Snapdragon's benchmarks are often from actively cooled devices, unlike Apple's fanless designs, making Apple susceptible to thermal throttling.
Key takeaway
For Machine Learning Engineers evaluating on-device AI hardware, consider that Qualcomm's Snapdragon offers significantly higher raw AI throughput, particularly for sustained, heavy workloads, due to its specialized NPU and thermal management. However, be aware that its reliance on INT4 quantization might introduce precision concerns, and Apple's M5 excels in perceived responsiveness ("time to first token") for user-facing applications. Your choice should balance raw performance, power efficiency, precision requirements, and user experience goals.
Key insights
Qualcomm's heterogeneous NPU architecture and power management deliver superior raw AI benchmark performance over Apple's M5.
Principles
- Specialized hardware accelerates specific computational tasks.
- Low-power dedicated processors optimize routine background operations.
- Tile-based processing reduces memory bottlenecks.
Method
Qualcomm's Hexagon NPU divides AI workloads across scalar, vector, and matrix processors, uses a Sensing Hub for low-power tasks, and breaks computations into small, local-memory-fitting tiles.
In practice
- Utilize heterogeneous computing for AI acceleration.
- Implement dedicated low-power cores for background sensing.
- Break large datasets into smaller, memory-optimized chunks.
Topics
- AI Benchmarking
- NPU Architecture
- Heterogeneous Computing
- On-device AI
- INT4 Quantization
Best for: Machine Learning Engineer, AI Engineer, AI Product Manager, AI Architect
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Bug.