HorizonMath: Measuring AI Progress Toward Mathematical Discovery with Automatic Verification

2026-03-16 · Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, AI for Mathematical Discovery · Depth: Advanced, quick

Summary

HorizonMath is a new benchmark comprising over 100 predominantly unsolved problems across eight domains in computational and applied mathematics, designed to measure AI progress toward mathematical discovery. It includes an open-source evaluation framework for automated verification, focusing on problems where discovery demands significant mathematical insight but verification is computationally efficient. This design makes HorizonMath robust against data contamination, as solutions are unknown, and current state-of-the-art models typically score near 0%. Unlike existing research-level benchmarks that rely on expensive formal proof verification or manual review, HorizonMath offers scalable automated verification. Initial evaluations using this platform show that GPT 5.4 Pro proposed solutions for two problems that potentially improve upon the best-known published results, suggesting novel contributions awaiting expert review. HorizonMath is released as an open challenge and a community resource.

Key takeaway

For AI researchers focused on advancing mathematical reasoning, HorizonMath offers a unique, contamination-immune benchmark to test models on unsolved problems. Your team should consider integrating HorizonMath into your evaluation pipeline to identify true discovery capabilities, as GPT 5.4 Pro has already shown potential for novel contributions on this platform.

Key insights

HorizonMath benchmarks AI's ability to solve unsolved math problems with automated, scalable verification.

Principles

Discovery is hard, verification can be simple.
Unknown solutions prevent data contamination.

Method

HorizonMath evaluates AI on unsolved math problems using an open-source framework for automated, computationally efficient verification, bypassing expensive formal proof or manual review.

In practice

Use HorizonMath for novel AI math research.
Contribute solutions to expand the benchmark.

Topics

Mathematical Discovery
AI Benchmarking
Large Language Models
Automated Verification
Computational Mathematics

Best for: AI Researcher, AI Scientist, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.