A new dataset with more that 100M hi-quality, curated images, with captions and meta data! [P]

2026-05-28 · Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Advanced, short

Summary

Jasper AI has released MONET, an Apache 2.0-licensed, open-source image-text dataset now available on Hugging Face. This extensive dataset comprises 104.9 million high-quality samples, meticulously refined from an initial pool of 2.9 billion images. MONET includes captions and detailed metadata, occupying 68TB of data. A companion paper (arxiv.org/abs/2605.21272) details its creation process, highlighting its filtered, deduplicated, and multi-VLM re-captioned nature, a feature noted as unique among large-scale T2I pre-training datasets. The dataset also incorporates approximately 15 million synthetic images, clearly marked with metadata, to enhance T2I model training. Complementary tools include a UMAP visualization, a text/image retrieval tool, and a codebase (nano-t2i) for training text-to-image models.

Key takeaway

For AI Scientists and Machine Learning Engineers developing text-to-image models, MONET provides a significant resource. You should consider integrating this 104.9 million-sample, multi-VLM re-captioned dataset to enhance your model's pre-training. Utilize the provided codebase and tools to streamline your development, potentially reducing the effort previously required for dataset assembly and curation.

Key insights

MONET offers a 104.9M image-text dataset, uniquely multi-VLM re-captioned, for large-scale T2I model pre-training.

Principles

Open-source datasets accelerate research.
Multi-VLM re-captioning enhances data quality.
Metadata is crucial for synthetic content.

Method

MONET was built by refining 2.9 billion images to 104.9 million high-quality samples, applying filtering, deduplication, and multi-VLM re-captioning, with clear metadata for synthetic images.

In practice

Use MONET for T2I model pre-training.
Explore UMAP for dataset distribution.
Utilize retrieval tool for image search.

Topics

Image-Text Datasets
Text-to-Image Models
Dataset Curation
Hugging Face Datasets
Apache 2.0 License
Multi-VLM Recaptioning

Code references

gojasper/nano-t2i

Best for: Computer Vision Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.