DIAGRAMS: A Review Framework for Reasoning-Level Attribution in Diagram QA

· Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Software Development & Engineering · Depth: Expert, extended

Summary

Diagrams is a new review framework designed to streamline reasoning-level attribution in Diagram Question Answering (Diagram QA), which requires linking question-answer pairs to all visual regions necessary for deriving an answer, not just the final response. The framework addresses the time-consuming nature of creating such structured evidence across diverse visual domains like diagrams, charts, maps, and circuits, and the issue of existing annotation tools being tightly coupled to dataset-specific formats. Diagrams employs a schema-driven approach with an internal meta-schema and dataset adapters to decouple interface logic from dataset structures. It proposes QA-conditioned evidence regions, and if QA pairs or candidate regions are missing, it generates them for human verification and refinement. Across six Diagram QA datasets, model-suggested evidence achieved 85.39% precision and 75.30% recall (micro-averaged) against reviewer-final selections, indicating a significant reduction in manual region creation while maintaining high agreement.

Key takeaway

For research scientists developing Diagram QA benchmarks or building grounded supervision datasets, you should consider integrating the Diagrams framework. Its review-first workflow, which leverages model-suggested evidence, can substantially reduce the manual effort involved in creating reasoning-level attributions. This approach allows you to focus on verifying and refining proposals rather than drawing regions from scratch, accelerating dataset creation and improving the fidelity of supervision for vision-language models.

Key insights

Diagrams streamlines reasoning-level attribution in Diagram QA by decoupling annotation logic from dataset formats.

Principles

Method

The Diagrams framework ingests dataset records, normalizes them to a meta-schema, uses a multimodal backend for QA-conditioned evidence selection or generation, and supports human verification and refinement of proposed regions and QA pairs.

In practice

Topics

Code references

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Data Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.