ExStrucTiny: A Benchmark for Schema-Variable Structured Information Extraction from Document Images
Summary
ExStrucTiny is a new benchmark dataset designed for structured Information Extraction (IE) from document images, addressing limitations in existing datasets for enterprise document processing. Current Vision Language Models (VLMs) struggle with holistic, fine-grained structured extraction across diverse document types and flexible schemas, a gap not adequately covered by existing Key Entity Extraction (KEE), Relation Extraction (RE), and Visual Question Answering (VQA) datasets due to narrow entity ontologies, simple queries, or homogeneous document types. ExStrucTiny unifies aspects of KEE, RE, and VQA, covering more varied document types and extraction scenarios through a novel pipeline combining manual and synthetic human-validated samples. Analysis of open and closed VLMs on ExStrucTiny reveals challenges such as schema adaptation, query under-specification, and answer localization.
Key takeaway
For research scientists developing or evaluating Vision Language Models for enterprise applications, ExStrucTiny offers a critical benchmark to assess performance on schema-variable structured information extraction. You should use this dataset to identify and address VLM weaknesses in handling diverse document types, adapting to flexible schemas, and localizing answers, thereby improving model robustness for real-world data archiving and automated workflows.
Key insights
ExStrucTiny is a new benchmark for structured information extraction from diverse, schema-variable document images.
Principles
- Enterprise documents require holistic IE.
- VLMs need schema adaptation capabilities.
- Benchmarks must reflect real-world diversity.
Method
ExStrucTiny was built using a novel pipeline that combines manual and synthetic human-validated samples to create a diverse dataset for structured IE.
In practice
- Test VLMs on schema adaptation.
- Evaluate models for query under-specification.
- Focus on answer localization accuracy.
Topics
- Structured Information Extraction
- Document Understanding
- Vision Language Models
- Benchmark Datasets
- Schema Adaptation
Best for: Research Scientist, AI Researcher, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.