Can tech companies learn to love cheaper AI models?

2026-06-09 · Source: AI News & Artificial Intelligence | TechCrunch · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Emerging Technologies & Innovation · Depth: Intermediate, short

Summary

The AI industry is facing a significant shift from its long-held assumption that bigger models are inherently more powerful and superior. Mounting costs are now compelling users to consider smaller, cheaper AI models, a trend predicted by Coinbase co-founder Brian Armstrong, who suggests 80% of workloads will run on 99% cheaper models within 12-18 months. This re-evaluation challenges the traditional "scaling-first" approach, which prioritized training compute-intensive models. Initial tests, such as one by legal AI tool Harvey in partnership with Fireworks AI, demonstrate that inference costs can be reduced by 3x without compromising quality by strategically combining models like Claude Opus and Fireworks' GLM 5.1. This economic shift could significantly impact large AI labs like OpenAI and Anthropic, as the definition of quality evolves to prioritize efficiency alongside accuracy.

Key takeaway

For AI Product Managers evaluating model deployment strategies, the rising cost of large models necessitates a shift in perspective. You should actively explore deploying smaller, more efficient models for the majority of your workloads, reserving frontier models only for truly intensive tasks. This approach, exemplified by Harvey's 3x cost reduction, can significantly lower inference expenses without compromising output quality, directly impacting your budget and project viability.

Key insights

Rising costs are shifting AI model strategy from "bigger is better" to "efficient is optimal," prioritizing smaller models for most workloads.

Principles

AI quality now includes efficiency.
Most workloads suit cheaper models.
Cost pressure redefines model selection.

Method

Strategically combine powerful frontier models for intensive tasks with cheaper, smaller models for general workloads to optimize cost without sacrificing quality.

In practice

Deploy smaller models for 80% of tasks.
Combine models for cost-effective inference.
Redefine "quality" to include efficiency.

Topics

AI Economics
Smaller AI Models
Inference Cost Optimization
Model Deployment Strategy
Brian Armstrong Prediction
Hybrid AI Architectures

Best for: CTO, VP of Engineering/Data, AI Engineer, Director of AI/ML, AI Product Manager, Consultant

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by AI News & Artificial Intelligence | TechCrunch.