Intelligence Per Dollar

· Source: Tomasz Tunguz · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Intermediate, quick

Summary

Microsoft recently introduced "average token usage" as a new metric on a model release card, specifically for its MAI-Code-1-Flash model. This metric, alongside performance, signals a shift in AI benchmarking towards cost-efficiency. For instance, the Microsoft model achieved 71.6 on SWE-Bench Verified using approximately one-third of the tokens consumed by Claude Haiku 4.5. This development highlights the end of an era focused solely on performance, as even major companies like Uber and Salesforce face significant AI spending challenges; Uber capped employee AI spending after four months, and Salesforce is spending \$300M on Anthropic tokens while freezing engineering hires. Independent analyses, such as Artificial Analysis, already compare models like GPT 5.5 and Claude Opus 4.8, which both score around 60 on the Intelligence Index, but cost \$3,357 and \$4,685 respectively, demonstrating a 40% price difference for similar intelligence. This emphasizes that the "intelligence per dollar" metric is becoming paramount for buyers.

Key takeaway

For AI Product Managers evaluating new models or optimizing existing deployments, your focus must shift beyond raw performance to "intelligence per dollar." You should prioritize models that deliver comparable results with significantly lower token usage, as demonstrated by Microsoft's new metric. This approach will help you manage escalating AI costs, avoid budget overruns like Uber's, and ensure your investments directly translate into cost-effective business outcomes, aligning with customer-centric pricing.

Key insights

AI model evaluation is shifting from raw performance to cost-efficiency, prioritizing "intelligence per dollar."

Principles

In practice

Topics

Best for: CTO, VP of Engineering/Data, AI Architect, Director of AI/ML, AI Product Manager, Consultant

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Tomasz Tunguz.