Qwen 3.5 Local Test with Ollama | Coding, OCR, Data Extraction, Image Understanding
Summary
The Qwen 3.5 model, the latest family of models from Alibaba Cloud, was tested locally using Ollama on an M4 Pro with 48 GB of unified memory. The specific version evaluated was a 35 billion parameter Mixture-of-Experts (MoE) model with 3 billion active parameters, featuring a unified vision and language foundation. This model, trained with extensive image and text data, demonstrated a significant improvement in image understanding over previous Qwen VL models. It utilizes 256 experts with only nine active during inference, contributing to its fast performance, and has a 262k context window extendable to 1 million tokens. Performance tests included recipe generation from an image of fridge contents, UI generation (HTML/Tailwind) from an image, explanation of a Domain-Driven Design diagram, receipt data extraction, and chart data extraction. While it excelled in most tasks, particularly UI generation and diagram explanation, receipt extraction showed inaccuracies in total amount.
Key takeaway
For AI Engineers evaluating multimodal models for local deployment, Qwen 3.5 presents a compelling option due to its efficient MoE architecture and strong performance in tasks like UI generation and diagram understanding. You should consider its 35 billion parameter version for applications requiring robust image and text processing, but be aware that data extraction from complex visual documents like receipts may require additional prompt engineering to ensure accuracy.
Key insights
Qwen 3.5 MoE model offers strong multimodal capabilities and efficient local inference.
Principles
- MoE architecture enables fast inference with large parameter counts.
- Unified vision-language training improves image understanding.
- Context window extension enhances long-range comprehension.
Method
The model was run locally via Ollama, using a quantized 35B parameter version. Performance was evaluated across five distinct multimodal tasks, measuring response time and accuracy.
In practice
- Run Qwen 3.5 locally on machines with 48GB unified memory.
- Utilize for UI generation from images and diagram interpretation.
- Consider prompt tuning for improved data extraction accuracy.
Topics
- Qwen 3.5
- Mixture-of-Experts
- Multimodal AI
- Local LLM Deployment
- Image Understanding
Best for: AI Engineer, Machine Learning Engineer, AI Student
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Venelin Valkov.