Arena Leaderboard Dataset
Summary
Arena has released its entire three-year history of AI capability leaderboards as a public-access dataset on Hugging Face, available at https://huggingface.co/datasets/lmarena-ai/leaderboard-dataset. This comprehensive dataset covers 10 arenas, dozens of categories, and hundreds of models, segmented into 14 subsets across 10 current arenas, with some offering style-controlled variants. Each subset features "latest" and "full" splits, providing either the most recent data or all historical entries. The data organization uses Hugging Face's Splits and Subsets concepts, including categories and publish dates. Example analyses demonstrate tracking model progress, such as the Text Arena's top five models increasing mean scores from 1,000 to nearly 1,500 since May 2023, visualizing the total number of tested models over time, and comparing proprietary versus open-source model distribution across different arenas.
Key takeaway
For AI and Research Scientists analyzing model performance trends, you should utilize the new Arena Leaderboard Dataset on Hugging Face. This resource allows you to track historical model progress, compare proprietary versus open-source adoption across modalities, and gain insights into the scaling of AI evaluations. Use the "full" splits to observe long-term score changes and identify emerging patterns in AI capabilities.
Key insights
Arena released its historical AI capability leaderboards as a public dataset for community analysis and open science.
Principles
- Historical leaderboard data reveals AI progress.
- Dataset segmentation aids granular analysis.
- License data enables open-source vs. proprietary comparisons.
Method
The dataset is organized using Hugging Face's Splits and Subsets, segmenting data by 10 arenas and across time, with "latest" and "full" historical splits for each.
In practice
- Track top model score changes over time.
- Visualize growth in tested AI models.
- Compare open-source vs. proprietary model rates.
Topics
- AI Leaderboards
- Model Evaluation
- Hugging Face Datasets
- Open-Source AI
- Proprietary Models
- AI Performance Trends
Best for: AI Scientist, Research Scientist, Data Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Arena Blog.