Qwen3.6 (27B) Local Test | Better than Gemma 4? | Coding, OCR, Images with llama.cpp | ๐ด Live
Summary
This content analyzes the performance of the newly released Qwen 3.6 (27B) dense model, contrasting it with the Qwen 3.5 (27B) and Qwen 3.6 (35B) Mixture-of-Experts (MoE) models. The analysis, conducted on a MacBook M4 Pro with 48GB of unified memory, primarily focuses on local inference speed and capabilities using the WAMA CPP server and Q4 KM quantization. Key tests included creative dialogue, image understanding (Bulgarian fridge test), and extensive coding tasks within the Open Code agentic framework. While the Qwen 3.6 (27B) model demonstrated strong reasoning, tool-calling, and multimodal understanding, its inference speed was notably slow at approximately 10 tokens per second, especially compared to the 35B MoE model which achieved 30-40 tokens per second. The model successfully generated a CV/resume website from images and iteratively improved its design using a front-end design skill, despite the performance limitations.
Key takeaway
For AI Engineers evaluating Qwen 3.6 (27B) for local deployment, be aware that while its reasoning and multimodal capabilities are strong, the 4-bit quantized version on M4 Pro hardware yields slow inference speeds (around 10 tokens/sec). Consider the Qwen 3.6 (35B) MoE model for better speed-to-performance ratio, especially for coding tasks. If deep reasoning and multimodal understanding are paramount and speed is secondary, the 27B dense model is viable, but prepare for longer processing times.
Key insights
Qwen 3.6 (27B) offers strong reasoning and multimodal capabilities but suffers from slow local inference speed.
Principles
- Dense models may offer deeper reasoning but can be slower than MoE.
- Quantization significantly impacts local model performance and memory usage.
Method
The analysis used WAMA CPP server with Q4 KM quantization on a MacBook M4 Pro, testing coding, image understanding, and design iteration via Open Code and custom skills.
In practice
- Use 4-bit quantization for Qwen 3.6 (27B) to fit 48GB unified memory.
- Employ agentic frameworks like Open Code for structured model evaluation.
- Configure Open Code for image modality to enable visual tasks.
Topics
- Qwen 3.6 (27B) Model
- Local LLM Inference
- llama.cpp
- Agentic Coding
- Multimodal AI
Best for: Machine Learning Engineer, AI Engineer, AI Student
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Venelin Valkov.