ExStrucTiny: A Benchmark for Schema-Variable Structured Information Extraction from Document Images

2026-02-12 · Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Advanced, quick

Summary

ExStrucTiny is a new benchmark dataset designed for structured Information Extraction (IE) from document images, addressing limitations in existing datasets for enterprise document processing. Current Vision Language Models (VLMs) struggle with holistic, fine-grained structured extraction across diverse document types and flexible schemas, a gap not adequately covered by existing Key Entity Extraction (KEE), Relation Extraction (RE), and Visual Question Answering (VQA) datasets due to narrow entity ontologies, simple queries, or homogeneous document types. ExStrucTiny unifies aspects of KEE, RE, and VQA, covering more varied document types and extraction scenarios through a novel pipeline combining manual and synthetic human-validated samples. Analysis of open and closed VLMs on ExStrucTiny reveals challenges such as schema adaptation, query under-specification, and answer localization.

Key takeaway

For research scientists developing or evaluating Vision Language Models for enterprise applications, ExStrucTiny offers a critical benchmark to assess performance on schema-variable structured information extraction. You should use this dataset to identify and address VLM weaknesses in handling diverse document types, adapting to flexible schemas, and localizing answers, thereby improving model robustness for real-world data archiving and automated workflows.

Key insights

ExStrucTiny is a new benchmark for structured information extraction from diverse, schema-variable document images.

Principles

Enterprise documents require holistic IE.
VLMs need schema adaptation capabilities.
Benchmarks must reflect real-world diversity.

Method

ExStrucTiny was built using a novel pipeline that combines manual and synthetic human-validated samples to create a diverse dataset for structured IE.

In practice

Test VLMs on schema adaptation.
Evaluate models for query under-specification.
Focus on answer localization accuracy.

Topics

Structured Information Extraction
Document Understanding
Vision Language Models
Benchmark Datasets
Schema Adaptation

Best for: Research Scientist, AI Researcher, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.