Qwen 3.5 Local Test with Ollama | Coding, OCR, Data Extraction, Image Understanding

2026-03-01 · Source: Venelin Valkov · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Software Development & Engineering · Depth: Intermediate, long

Summary

The Qwen 3.5 model, the latest family of models from Alibaba Cloud, was tested locally using Ollama on an M4 Pro with 48 GB of unified memory. The specific version evaluated was a 35 billion parameter Mixture-of-Experts (MoE) model with 3 billion active parameters, featuring a unified vision and language foundation. This model, trained with extensive image and text data, demonstrated a significant improvement in image understanding over previous Qwen VL models. It utilizes 256 experts with only nine active during inference, contributing to its fast performance, and has a 262k context window extendable to 1 million tokens. Performance tests included recipe generation from an image of fridge contents, UI generation (HTML/Tailwind) from an image, explanation of a Domain-Driven Design diagram, receipt data extraction, and chart data extraction. While it excelled in most tasks, particularly UI generation and diagram explanation, receipt extraction showed inaccuracies in total amount.

Key takeaway

For AI Engineers evaluating multimodal models for local deployment, Qwen 3.5 presents a compelling option due to its efficient MoE architecture and strong performance in tasks like UI generation and diagram understanding. You should consider its 35 billion parameter version for applications requiring robust image and text processing, but be aware that data extraction from complex visual documents like receipts may require additional prompt engineering to ensure accuracy.

Key insights

Qwen 3.5 MoE model offers strong multimodal capabilities and efficient local inference.

Principles

MoE architecture enables fast inference with large parameter counts.
Unified vision-language training improves image understanding.
Context window extension enhances long-range comprehension.

Method

The model was run locally via Ollama, using a quantized 35B parameter version. Performance was evaluated across five distinct multimodal tasks, measuring response time and accuracy.

In practice

Run Qwen 3.5 locally on machines with 48GB unified memory.
Utilize for UI generation from images and diagram interpretation.
Consider prompt tuning for improved data extraction accuracy.

Topics

Qwen 3.5
Mixture-of-Experts
Multimodal AI
Local LLM Deployment
Image Understanding

Best for: AI Engineer, Machine Learning Engineer, AI Student

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Venelin Valkov.