Qwen3-VL Accuracy Differences on Ollama vs MLX

2025-11-04 · Source: Andrej Baranovskij · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Intermediate, short

Summary

A comparison of the Qwen-VL 30B vision language model running on MLX with MLX VLM versus Ollama reveals significant differences in both accuracy and performance. The Qwen-VL 30B model, using 8-bit quantization, was tested on a Mac Mini M4 with 64GB RAM against a financial statement document. The MLX VLM implementation consistently produced inaccurate results, including missing rows like "marketable securities" and incorrect values for "deferred tax assets" and "other non-current assets," completing the task in approximately 37-39 seconds. In contrast, the Ollama implementation, while consuming more memory (up to 95% vs. 75% for MLX), delivered 100% accurate results, correctly identifying all financial line items and values, with initial run times around 58 seconds, improving to 29-35 seconds on subsequent cached runs.

Key takeaway

For MLOps Engineers deploying vision language models, you must validate model accuracy and performance directly on your intended production inference platform. Developing and testing on one platform and deploying to another can lead to unexpected and critical accuracy degradation, as demonstrated by the Qwen-VL 30B model's varied results on MLX versus Ollama. Always conduct end-to-end testing in the production environment to ensure reliable outcomes.

Key insights

The same vision language model can yield different accuracy and performance across platforms.

Principles

Model conversion can impact quality.
Platform choice affects model behavior.

Method

The study compared Qwen-VL 30B (8-bit quantized) on MLX VLM and Ollama using a financial document, querying specific data points and evaluating output accuracy and execution time.

In practice

Test models on target production platforms.
Verify accuracy across different inference engines.

Topics

Qwen3-VL
MLX Framework
Ollama Platform
Vision Language Models
Model Accuracy

Best for: Machine Learning Engineer, AI Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Andrej Baranovskij.