ChartNet: A Million-Scale, High-Quality Multimodal Dataset for Robust Chart Understanding

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, extended

Summary

ChartNet is a new, high-quality, million-scale multimodal dataset designed to enhance chart interpretation and reasoning capabilities in vision-language models (VLMs). It features 1.5 million diverse chart samples, covering 24 chart types and 6 plotting libraries, generated through a novel code-guided synthesis pipeline. Each sample includes five aligned components: plotting code, rendered chart image, data table, natural language summary, and question-answering with reasoning. The dataset also incorporates specialized subsets with human-annotated data, real-world charts from sources like the World Bank and Pew Research Center, safety data, and grounding annotations. A rigorous quality-filtering pipeline ensures visual fidelity and semantic accuracy. Fine-tuning VLMs on ChartNet consistently improves performance across various benchmarks, with ChartNet-tuned models often outperforming much larger off-the-shelf models, including GPT-4o, in tasks like chart reconstruction, data extraction, and summarization.

Key takeaway

For AI Engineers and Research Scientists developing or deploying VLMs for data visualization tasks, ChartNet offers a critical resource. Its million-scale, code-aligned multimodal data significantly boosts model performance in chart reconstruction, data extraction, and complex reasoning, often surpassing larger, general-purpose models. Integrate ChartNet into your training pipelines to achieve robust and generalizable chart understanding capabilities, particularly for applications requiring precise numerical and semantic interpretation.

Key insights

Code-guided synthetic data generation significantly improves VLM chart understanding and reasoning capabilities.

Principles

Method

A code-guided synthesis pipeline generates diverse chart samples with aligned multimodal components, including image, code, data, text summary, and QA with Chain-of-Thought reasoning, followed by rigorous quality filtering.

In practice

Topics

Code references

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.