Intelligence Per Dollar
Summary
Microsoft recently introduced "average token usage" as a new metric on a model release card, specifically for its MAI-Code-1-Flash model. This metric, alongside performance, signals a shift in AI benchmarking towards cost-efficiency. For instance, the Microsoft model achieved 71.6 on SWE-Bench Verified using approximately one-third of the tokens consumed by Claude Haiku 4.5. This development highlights the end of an era focused solely on performance, as even major companies like Uber and Salesforce face significant AI spending challenges; Uber capped employee AI spending after four months, and Salesforce is spending \$300M on Anthropic tokens while freezing engineering hires. Independent analyses, such as Artificial Analysis, already compare models like GPT 5.5 and Claude Opus 4.8, which both score around 60 on the Intelligence Index, but cost \$3,357 and \$4,685 respectively, demonstrating a 40% price difference for similar intelligence. This emphasizes that the "intelligence per dollar" metric is becoming paramount for buyers.
Key takeaway
For AI Product Managers evaluating new models or optimizing existing deployments, your focus must shift beyond raw performance to "intelligence per dollar." You should prioritize models that deliver comparable results with significantly lower token usage, as demonstrated by Microsoft's new metric. This approach will help you manage escalating AI costs, avoid budget overruns like Uber's, and ensure your investments directly translate into cost-effective business outcomes, aligning with customer-centric pricing.
Key insights
AI model evaluation is shifting from raw performance to cost-efficiency, prioritizing "intelligence per dollar."
Principles
- Benchmarks now require dual dimensions: performance and cost.
- AI spending is under scrutiny, even for large enterprises.
- Pricing models must align with customer outcomes, not tokens.
In practice
- Evaluate models using "intelligence per dollar" metrics.
- Monitor token usage alongside benchmark scores.
- Re-evaluate current AI spending against outcomes.
Topics
- AI Cost Optimization
- Model Benchmarking
- Token Usage
- Enterprise AI Spending
- MAI-Code-1-Flash
- Claude Haiku
Best for: CTO, VP of Engineering/Data, AI Architect, Director of AI/ML, AI Product Manager, Consultant
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Tomasz Tunguz.