A Theory of Training Profit-Optimal LLMs
Summary
Researchers Sophie Hao and William Merrill developed an economic model to determine profit-optimal training strategies for Large Language Models (LLMs), combining scaling laws with microeconomic theory. Their model considers LLM quality, which increases with more parameters ($n$) and training tokens ($d$), leading to higher consumer adoption, against the rising costs of increased parameters and tokens. They analyzed profit maximization in both compute-bound and data-bound regimes. In the compute-bound regime, optimal model size and token budget scale near-linearly with hardware efficiency ($E$), while total training cost scales sub-quadratically in $E$. Data efficiency improvements encourage larger models and higher training expenditure. In the data-bound regime, optimal training expenditure scales quadratically with available data ($D$) but decreases with hardware efficiency. Current empirical trends in training expenditure are consistent with their most permissive compute-bound model variants but exceed profit-optimal levels in the data-bound regime or if hardware advances stall.
Key takeaway
For entrepreneurs and investors weighing LLM development strategies, understand that current exponential growth in training compute may not be profit-optimal under all conditions. Your investment decisions should critically consider whether your operations are compute-bound or data-bound, as optimal scaling behaviors and expenditure rates differ significantly. If hardware efficiency gains slow, current spending rates could exceed profit-optimal bounds, necessitating a shift towards optimizing for data efficiency and smaller models.
Key insights
Profit-optimal LLM training balances quality-driven demand with scaling costs, varying significantly by compute or data constraints.
Principles
- LLM quality improvements exhibit diminishing returns.
- Optimal scaling depends on hardware and data efficiency.
- Model size and data are complements for quality.
Method
The method formalizes LLM profit maximization using a quasilinear inverse demand function and a Leontief scaling law for quality, then solves for optimal parameters ($n^*$, $d^*$) under compute-bound and data-bound constraints.
In practice
- Evaluate LLM scaling decisions based on compute vs. data availability.
- Prioritize data efficiency improvements in compute-bound scenarios.
- Re-evaluate investment if hardware efficiency plateaus.
Topics
- LLM Profit Optimization
- Scaling Laws
- Compute-Bound Training
- Data-Bound Training
- Hardware Efficiency
Best for: Research Scientist, Investor, Entrepreneur, AI Scientist, Director of AI/ML, Executive
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.LG updates on arXiv.org.