From Flutter On-Device to Gemini Vision: Building a Multi-Language Document Extraction Engine

· Source: LLM on Medium · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Data Science & Analytics · Depth: Intermediate, long

Summary

A solo developer achieved 85-95% transaction extraction accuracy and 92% category accuracy for a multi-language document extraction engine, supporting 5 languages and over 20 document formats. Initially, the project struggled with 40% accuracy using Flutter on-device text extraction, which failed on image-based PDFs, and Google Document AI, which distorted table structures and over-fit to English formats. The breakthrough involved a hybrid Vision LLM approach, primarily using Gemini 2.5 Flash with Claude Sonnet as a fallback. This method combines PDF images for structural understanding and raw OCR text/numeric tokens for precise financial data, significantly reducing manual review to 10-15% and eliminating code changes for new document formats. Aggressive caching and targeted repair calls further optimized costs and reliability.

Key takeaway

For AI Engineers building document extraction systems, prioritize a hybrid Vision LLM and OCR approach from the outset. You should combine Vision LLMs for structural understanding and OCR for numeric precision to achieve high accuracy across diverse document formats and languages. This strategy minimizes development time for new formats and significantly reduces manual review, avoiding the pitfalls of text-only parsing or on-device solutions.

Key insights

Hybrid Vision LLM and OCR excels at structured data extraction from diverse, unstructured documents by combining visual and numeric accuracy.

Principles

Method

Send PDF images for structure, raw OCR text for context, and parsed numeric tokens for anchoring to a Vision LLM. Implement a multi-stage fallback chain with repair calls and aggressive caching.

In practice

Topics

Best for: AI Engineer, Machine Learning Engineer, Director of AI/ML

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by LLM on Medium.