AI IQ is here: a new site scores frontier AI models on the human IQ scale. The results are already dividing tech.
Summary
AI IQ, a new startup project, assigns estimated intelligence quotients (IQ) and emotional intelligence (EQ) scores to over 50 large language models (LLMs) and visualizes them on a bell curve and scatter plots. Launched by Ryan Shea, co-founder of Stacks, the platform aggregates 12 benchmarks into four reasoning dimensions: abstract, mathematical, programmatic, and academic. The composite IQ is a straight average of these dimensions, with scores mapped using hand-calibrated difficulty curves that compress ceilings for easier benchmarks. As of mid-May 2026, OpenAI's GPT-5.5 leads with an IQ near 136, closely followed by Anthropic's Opus 4.7 (IQ 132, EQ 132) and Google's Gemini 3.1 Pro (IQ 131). The platform also includes an "Effective Cost" metric, revealing that top-tier models like GPT-5.5 and Opus 4.7 have per-task costs exceeding $30 and $50, respectively, while models like DeepSeek-V3.2 offer respectable IQs (112-120) for $1-$5 per task.
Key takeaway
For AI Engineers evaluating LLMs for enterprise deployment, you should prioritize a multi-dimensional assessment that includes not only cognitive performance (IQ) but also emotional intelligence (EQ) and, critically, effective cost. The narrowing intelligence gap between high-cost and mid-range models necessitates implementing model routing strategies, where you deploy expensive models only for complex tasks and more economical options for routine workloads, to optimize both performance and budget.
Key insights
AI IQ provides a unified framework for benchmarking LLMs across IQ, EQ, and cost, despite methodological debates.
Principles
- AI capabilities are "jagged" and not easily reducible to a single score.
- Cost-performance is crucial for enterprise AI deployments.
- Emotional intelligence is an emerging factor in model utility.
Method
AI IQ calculates a composite IQ by averaging scores from 12 benchmarks across four reasoning dimensions: abstract, mathematical, programmatic, and academic. EQ is a 50/50 weighted composite of EQ-Bench 3 Elo and Arena Elo scores, with bias correction for Anthropic models.
In practice
- Compare LLMs using IQ, EQ, and Effective Cost metrics.
- Implement model routing for cost-efficient AI deployments.
- Consider models with strong EQ for user-facing applications.
Topics
- AI IQ
- Language Model Benchmarking
- AI Intelligence Quotient
- AI Emotional Intelligence
- Cost-Performance Analysis
Best for: AI Engineer, Machine Learning Engineer, NLP Engineer, Director of AI/ML, AI Architect, AI Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by VentureBeat.