Arena Leaderboard Dataset

2026-04-02 · Source: Arena Blog · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Emerging Technologies & Innovation · Depth: Intermediate, quick

Summary

Arena has released its entire three-year history of AI capability leaderboards as a public-access dataset on Hugging Face, available at https://huggingface.co/datasets/lmarena-ai/leaderboard-dataset. This comprehensive dataset covers 10 arenas, dozens of categories, and hundreds of models, segmented into 14 subsets across 10 current arenas, with some offering style-controlled variants. Each subset features "latest" and "full" splits, providing either the most recent data or all historical entries. The data organization uses Hugging Face's Splits and Subsets concepts, including categories and publish dates. Example analyses demonstrate tracking model progress, such as the Text Arena's top five models increasing mean scores from 1,000 to nearly 1,500 since May 2023, visualizing the total number of tested models over time, and comparing proprietary versus open-source model distribution across different arenas.

Key takeaway

For AI and Research Scientists analyzing model performance trends, you should utilize the new Arena Leaderboard Dataset on Hugging Face. This resource allows you to track historical model progress, compare proprietary versus open-source adoption across modalities, and gain insights into the scaling of AI evaluations. Use the "full" splits to observe long-term score changes and identify emerging patterns in AI capabilities.

Key insights

Arena released its historical AI capability leaderboards as a public dataset for community analysis and open science.

Principles

Historical leaderboard data reveals AI progress.
Dataset segmentation aids granular analysis.
License data enables open-source vs. proprietary comparisons.

Method

The dataset is organized using Hugging Face's Splits and Subsets, segmenting data by 10 arenas and across time, with "latest" and "full" historical splits for each.

In practice

Track top model score changes over time.
Visualize growth in tested AI models.
Compare open-source vs. proprietary model rates.

Topics

AI Leaderboards
Model Evaluation
Hugging Face Datasets
Open-Source AI
Proprietary Models
AI Performance Trends

Best for: AI Scientist, Research Scientist, Data Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Arena Blog.