Ollama and MLX-VLM Accuracy Review (Qwen3-VL and Mistral Small 3.2)

2025-11-26 · Source: Andrej Baranovskij · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Software Development & Engineering · Depth: Intermediate, long

Summary

A detailed comparison of the Qwen3 and Mistral 3.2 small vision-language models (VLMs) was conducted, evaluating their accuracy in structured data extraction from financial documents. The models were tested on both Olama and MLX-VLM frameworks, using 8-bit and 4-bit quantization. For simple bond tables, both models performed well across frameworks, with MLX-VLM demonstrating faster inference times (24 seconds for Qwen3 vs. 35 seconds on Olama). However, for more complex financial statements and bank statements, MLX-VLM consistently exhibited accuracy issues with both Qwen3 and Mistral, failing to correctly extract data. In contrast, Olama maintained high accuracy for both models on these complex documents, albeit with longer inference times (e.g., 59 seconds for Qwen3 on financial statements). The analysis suggests MLX-VLM struggles with larger tables and sparse data.

Key takeaway

For AI Engineers or Data Scientists building structured data extraction systems, if your use case involves complex financial documents or sparse tables, you should prioritize Olama for its superior accuracy, despite potentially longer inference times. While MLX-VLM offers speed, its current accuracy limitations on intricate data make it less suitable for production systems requiring high fidelity in such tasks. Evaluate your data's complexity before committing to a VLM framework.

Key insights

MLX-VLM offers faster VLM inference but struggles with accuracy on complex, sparse structured data compared to Olama.

Principles

Quantization impacts memory and performance.
Framework choice affects VLM accuracy.
Complexity of data influences extraction reliability.

Method

The study compared Qwen3 and Mistral VLMs on Olama and MLX-VLM using 8-bit/4-bit quantization for structured data extraction from bond, financial, and bank statements, measuring accuracy and inference time.

In practice

Use Olama for high-accuracy structured data extraction.
Consider MLX-VLM for simpler, speed-critical tasks.
Test VLMs with diverse data complexity.

Topics

Vision Language Models
Structured Data Extraction
MLX-VLM Performance
Olama Performance
Qwen3 and Mistral Models

Best for: AI Engineer, Machine Learning Engineer, Data Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Andrej Baranovskij.