Smart Document Extraction with Business Rules — Gemma vs Qwen vs Ministral
Summary
The Sparrow platform introduces "LM hints," a JSON-based mechanism allowing users to instruct large language models (LLMs) to generate or format data fields during query execution, even if those fields do not exist in the original document. This functionality enables the calculation of new fields, such as a "risk category" based on profit/loss values, or the reformatting of existing data, like instrument names or numerical values into European format. The system passes these hints as a separate parameter alongside the query, which the LLM then interprets to produce the desired output. The article demonstrates this capability by testing Gemma 4 (31B dense), Qwen 3.6 (27B dense), and Mistral (14B 8-bit quantized) models, comparing their accuracy and performance in applying these rules to a banking table with instrument positions.
Key takeaway
For AI Engineers or ML Directors evaluating LLM integration for data processing, Sparrow's LM hints offer a powerful way to extend LLM capabilities beyond simple extraction. You should consider using this functionality for logical rule-based calculations and data formatting directly within the query, rather than post-processing. However, for complex mathematical operations, offloading calculations to a backend Python script remains advisable to ensure accuracy.
Key insights
LM hints in Sparrow enable LLMs to generate and format data fields dynamically during query execution.
Principles
- LLMs can infer rules from general examples.
- Hints allow LLMs to create non-existent fields.
Method
Define field-specific rules in a JSON hint file, including calculation logic or formatting instructions. Pass these hints with the query to the LLM, which then processes and returns the augmented data.
In practice
- Calculate derived fields like "risk category."
- Reformat numbers to locale-specific standards.
- Generate shorter, descriptive instrument names.
Topics
- Sparrow Hints
- Large Language Models
- Document Extraction
- Business Rules
- Gemma 4
Best for: AI Engineer, Machine Learning Engineer, Director of AI/ML
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Andrej Baranovskij.