Best AI Models Today (AGI-2 TEST)
Summary
The Arc AGI2 leaderboard, updated May 7th, 2026, evaluates various large language models, with human panel performance reaching 100%. The top-performing model is GPD 5.5 XH High, achieving 85%. Other notable GPD models include GPD 5.5 Pro High at 84.66% and GPD 5.5 Medium at 70%. Gemini 3.1 Pro also shows strong performance, and a specific Gemini 3 Deep Sync model from February 2026 scored 84.6%, but at a higher cost of $13 compared to GPD models priced around $1.87 or less than a dollar. Entropic's Claude 4.7 Max achieved 75% at a cost of $7, while other Claude models like 4.6 High and 4.7 High also feature on the leaderboard.
Key takeaway
For AI Engineers evaluating large language models for deployment, prioritize GPD 5.5 XH High for its leading 85% performance on the Arc AGI2 leaderboard. Carefully compare its cost-effectiveness against alternatives like Gemini 3 Deep Sync, which offers similar performance but at a significantly higher price point. Your selection should balance raw performance with budget constraints.
Key insights
GPD 5.5 XH High leads the Arc AGI2 leaderboard with 85% performance, often at lower costs.
Principles
- Performance varies significantly across models.
- Cost-performance ratio is a critical evaluation metric.
In practice
- Compare GPD 5.5 XH High for top performance.
- Evaluate Gemini 3 Deep Sync for high performance despite higher cost.
- Consider Claude 4.7 Max for a balance of performance and cost.
Topics
- Arc AGI2 Leaderboard
- GPD 5.5 Series
- Gemini 3 Deep Syncing
- Entropic Claude
- AI Model Performance
Best for: CTO, VP of Engineering/Data, AI Engineer, AI Scientist, Machine Learning Engineer, Director of AI/ML
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Discover AI.