Ollama and MLX-VLM Accuracy Review (Qwen3-VL and Mistral Small 3.2)
Summary
A detailed comparison of the Qwen3 and Mistral 3.2 small vision-language models (VLMs) was conducted, evaluating their accuracy in structured data extraction from financial documents. The models were tested on both Olama and MLX-VLM frameworks, using 8-bit and 4-bit quantization. For simple bond tables, both models performed well across frameworks, with MLX-VLM demonstrating faster inference times (24 seconds for Qwen3 vs. 35 seconds on Olama). However, for more complex financial statements and bank statements, MLX-VLM consistently exhibited accuracy issues with both Qwen3 and Mistral, failing to correctly extract data. In contrast, Olama maintained high accuracy for both models on these complex documents, albeit with longer inference times (e.g., 59 seconds for Qwen3 on financial statements). The analysis suggests MLX-VLM struggles with larger tables and sparse data.
Key takeaway
For AI Engineers or Data Scientists building structured data extraction systems, if your use case involves complex financial documents or sparse tables, you should prioritize Olama for its superior accuracy, despite potentially longer inference times. While MLX-VLM offers speed, its current accuracy limitations on intricate data make it less suitable for production systems requiring high fidelity in such tasks. Evaluate your data's complexity before committing to a VLM framework.
Key insights
MLX-VLM offers faster VLM inference but struggles with accuracy on complex, sparse structured data compared to Olama.
Principles
- Quantization impacts memory and performance.
- Framework choice affects VLM accuracy.
- Complexity of data influences extraction reliability.
Method
The study compared Qwen3 and Mistral VLMs on Olama and MLX-VLM using 8-bit/4-bit quantization for structured data extraction from bond, financial, and bank statements, measuring accuracy and inference time.
In practice
- Use Olama for high-accuracy structured data extraction.
- Consider MLX-VLM for simpler, speed-critical tasks.
- Test VLMs with diverse data complexity.
Topics
- Vision Language Models
- Structured Data Extraction
- MLX-VLM Performance
- Olama Performance
- Qwen3 and Mistral Models
Best for: AI Engineer, Machine Learning Engineer, Data Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Andrej Baranovskij.