Gemma 4 for Structured Data Extraction: Can It Beat Qwen 3.5?

2026-04-21 · Source: Andrej Baranovskij · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Intermediate, medium

Summary

A comparative analysis evaluates Google's Gemma 4 models against Qwen 3.5 and Mistral models for structured data extraction from a bank statement with missing values. The Gemma 4 26B Mixture-of-Experts (MoE) model, utilizing 4 billion active parameters, processed the document in 33 seconds with high accuracy. The Gemma 4 31B dense model, also 8-bit quantized, consumed 52GB of RAM on an Apple Mac Mini M4 Pro and completed the task in 150 seconds, also with good accuracy. In contrast, a 14B parameter Mistral model failed to correctly extract all records, taking 82 seconds and using 35GB of memory. The Qwen 3.5 35B MoE model, with 3 billion active parameters, achieved good accuracy in 39 seconds, consuming 55GB of RAM. The overall finding suggests Gemma 4 is comparable to Qwen 3.5 but does not surpass Mistral Small 3.2 for structured data extraction.

Key takeaway

For AI Engineers evaluating LLMs for structured data extraction, particularly from documents with sparse tables, consider Gemma 4 and Qwen 3.5 as comparable options. However, if your primary goal is maximum accuracy for invoice-like structured data, Mistral Small 3.2 remains a stronger contender. Benchmark these models on your specific data types to validate performance and resource requirements, especially for 8-bit quantized versions on local hardware.

Key insights

Gemma 4 models perform comparably to Qwen 3.5 for structured data extraction but do not outperform Mistral Small 3.2.

Principles

MoE models offer faster inference than dense models.
Smaller models may struggle with sparse tabular data.
8-bit quantization enables larger models on consumer hardware.

Method

The evaluation involved testing 8-bit quantized large language models (LLMs) on a single bank statement document containing tables with missing values, measuring both extraction accuracy and inference time on an Apple Mac Mini M4 Pro.

In practice

Consider MoE architectures for faster inference.
Use 8-bit quantization to run 30B+ models on 64GB RAM.
Benchmark models on sparse tabular data for extraction tasks.

Topics

Gemma 4
Qwen 3.5
Mistral Models
Structured Data Extraction
Mixture-of-Experts

Best for: Machine Learning Engineer, AI Engineer, AI Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Andrej Baranovskij.