Qwen3.6 (27B) Local Test | Better than Gemma 4? | Coding, OCR, Images with llama.cpp | 🔴 Live

2026-04-23 · Source: Venelin Valkov · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Intermediate, extended

Summary

This content analyzes the performance of the newly released Qwen 3.6 (27B) dense model, contrasting it with the Qwen 3.5 (27B) and Qwen 3.6 (35B) Mixture-of-Experts (MoE) models. The analysis, conducted on a MacBook M4 Pro with 48GB of unified memory, primarily focuses on local inference speed and capabilities using the WAMA CPP server and Q4 KM quantization. Key tests included creative dialogue, image understanding (Bulgarian fridge test), and extensive coding tasks within the Open Code agentic framework. While the Qwen 3.6 (27B) model demonstrated strong reasoning, tool-calling, and multimodal understanding, its inference speed was notably slow at approximately 10 tokens per second, especially compared to the 35B MoE model which achieved 30-40 tokens per second. The model successfully generated a CV/resume website from images and iteratively improved its design using a front-end design skill, despite the performance limitations.

Key takeaway

For AI Engineers evaluating Qwen 3.6 (27B) for local deployment, be aware that while its reasoning and multimodal capabilities are strong, the 4-bit quantized version on M4 Pro hardware yields slow inference speeds (around 10 tokens/sec). Consider the Qwen 3.6 (35B) MoE model for better speed-to-performance ratio, especially for coding tasks. If deep reasoning and multimodal understanding are paramount and speed is secondary, the 27B dense model is viable, but prepare for longer processing times.

Key insights

Qwen 3.6 (27B) offers strong reasoning and multimodal capabilities but suffers from slow local inference speed.

Principles

Dense models may offer deeper reasoning but can be slower than MoE.
Quantization significantly impacts local model performance and memory usage.

Method

The analysis used WAMA CPP server with Q4 KM quantization on a MacBook M4 Pro, testing coding, image understanding, and design iteration via Open Code and custom skills.

In practice

Use 4-bit quantization for Qwen 3.6 (27B) to fit 48GB unified memory.
Employ agentic frameworks like Open Code for structured model evaluation.
Configure Open Code for image modality to enable visual tasks.

Topics

Qwen 3.6 (27B) Model
Local LLM Inference
llama.cpp
Agentic Coding
Multimodal AI

Best for: Machine Learning Engineer, AI Engineer, AI Student

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Venelin Valkov.