Smart Document Extraction with Business Rules — Gemma vs Qwen vs Ministral

· Source: Andrej Baranovskij · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Intermediate, medium

Summary

The Sparrow platform introduces "LM hints," a JSON-based mechanism allowing users to instruct large language models (LLMs) to generate or format data fields during query execution, even if those fields do not exist in the original document. This functionality enables the calculation of new fields, such as a "risk category" based on profit/loss values, or the reformatting of existing data, like instrument names or numerical values into European format. The system passes these hints as a separate parameter alongside the query, which the LLM then interprets to produce the desired output. The article demonstrates this capability by testing Gemma 4 (31B dense), Qwen 3.6 (27B dense), and Mistral (14B 8-bit quantized) models, comparing their accuracy and performance in applying these rules to a banking table with instrument positions.

Key takeaway

For AI Engineers or ML Directors evaluating LLM integration for data processing, Sparrow's LM hints offer a powerful way to extend LLM capabilities beyond simple extraction. You should consider using this functionality for logical rule-based calculations and data formatting directly within the query, rather than post-processing. However, for complex mathematical operations, offloading calculations to a backend Python script remains advisable to ensure accuracy.

Key insights

LM hints in Sparrow enable LLMs to generate and format data fields dynamically during query execution.

Principles

Method

Define field-specific rules in a JSON hint file, including calculation logic or formatting instructions. Pass these hints with the query to the LLM, which then processes and returns the augmented data.

In practice

Topics

Best for: AI Engineer, Machine Learning Engineer, Director of AI/ML

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Andrej Baranovskij.