Ollama is now powered by MLX on Apple Silicon in preview
Summary
Ollama has released a preview version, 0.19, that significantly boosts performance on Apple Silicon devices by integrating Apple's MLX machine learning framework. This update leverages the unified memory architecture and, on M5, M5 Pro, and M5 Max chips, utilizes new GPU Neural Accelerators to enhance both time to first token (TTFT) and generation speed. Benchmarking on March 29, 2026, with Alibaba's Qwen3.5-35B-A3B model, showed Ollama 0.19 achieving 1810 tokens/s prefill and 134 tokens/s decode with NVFP4 quantization, compared to Ollama 0.18's 1154 tokens/s prefill. The new version also introduces support for NVIDIA's NVFP4 format for improved model accuracy and reduced memory footprint, along with an upgraded caching system for better responsiveness in agentic and coding tasks.
Key takeaway
For NLP Engineers and developers running local LLMs on Apple Silicon, upgrading to Ollama 0.19 is crucial for substantial performance gains, especially for coding agents and personal assistants. This update, powered by MLX and NVFP4, offers faster inference and improved memory efficiency, making it easier to deploy large models like Qwen3.5-35B-A3B locally. Ensure your Mac has over 32GB of unified memory to fully benefit from these enhancements.
Key insights
Ollama 0.19 integrates Apple's MLX and NVIDIA's NVFP4 for faster, more efficient local LLM inference on Apple Silicon.
Principles
- Unified memory architecture enhances ML performance.
- Low-precision formats like NVFP4 balance accuracy and efficiency.
Method
Ollama 0.19 uses MLX for Apple Silicon acceleration and NVFP4 for efficient quantization, coupled with an intelligent caching system that reuses cache across conversations and stores checkpoints.
In practice
- Run Qwen3.5-35B-A3B for coding tasks.
- Utilize NVFP4 for production parity.
- Requires Mac with >32GB unified memory.
Topics
- Ollama
- Apple Silicon
- MLX Framework
- NVFP4 Quantization
- Qwen3.5-35B-A3B Model
Code references
Best for: NLP Engineer, Machine Learning Engineer, AI Engineer, AI Hardware Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Ollama Blog.