Can tech companies learn to love cheaper AI models?
Summary
The AI industry is facing a significant shift from its long-held assumption that bigger models are inherently more powerful and superior. Mounting costs are now compelling users to consider smaller, cheaper AI models, a trend predicted by Coinbase co-founder Brian Armstrong, who suggests 80% of workloads will run on 99% cheaper models within 12-18 months. This re-evaluation challenges the traditional "scaling-first" approach, which prioritized training compute-intensive models. Initial tests, such as one by legal AI tool Harvey in partnership with Fireworks AI, demonstrate that inference costs can be reduced by 3x without compromising quality by strategically combining models like Claude Opus and Fireworks' GLM 5.1. This economic shift could significantly impact large AI labs like OpenAI and Anthropic, as the definition of quality evolves to prioritize efficiency alongside accuracy.
Key takeaway
For AI Product Managers evaluating model deployment strategies, the rising cost of large models necessitates a shift in perspective. You should actively explore deploying smaller, more efficient models for the majority of your workloads, reserving frontier models only for truly intensive tasks. This approach, exemplified by Harvey's 3x cost reduction, can significantly lower inference expenses without compromising output quality, directly impacting your budget and project viability.
Key insights
Rising costs are shifting AI model strategy from "bigger is better" to "efficient is optimal," prioritizing smaller models for most workloads.
Principles
- AI quality now includes efficiency.
- Most workloads suit cheaper models.
- Cost pressure redefines model selection.
Method
Strategically combine powerful frontier models for intensive tasks with cheaper, smaller models for general workloads to optimize cost without sacrificing quality.
In practice
- Deploy smaller models for 80% of tasks.
- Combine models for cost-effective inference.
- Redefine "quality" to include efficiency.
Topics
- AI Economics
- Smaller AI Models
- Inference Cost Optimization
- Model Deployment Strategy
- Brian Armstrong Prediction
- Hybrid AI Architectures
Best for: CTO, VP of Engineering/Data, AI Engineer, Director of AI/ML, AI Product Manager, Consultant
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by AI News & Artificial Intelligence | TechCrunch.