I Fine-Tuned One Model 3 Ways: The $50,000 Run Forgot More Than the $1,500 One
Summary
An analysis of fine-tuning an 8B model using full fine-tuning, LoRA, and QLoRA revealed a critical trade-off. The most expensive method, full fine-tuning on \$50,000 H100s, achieved a slight win on a target benchmark but significantly degraded the model's general capabilities. This degradation was more pronounced than with cheaper methods like LoRA, which was trained on a \$1,500 RTX 4090. The findings challenge the common perception that full fine-tuning is always the superior "gold standard," demonstrating that while it learns more, it also forgets more. Conversely, LoRA learns less but forgets less, suggesting that the two methods produce structurally different models even when target task performance appears similar.
Key takeaway
For AI engineers and teams managing significant GPU budgets for model fine-tuning, critically re-evaluate the assumption that full fine-tuning is inherently superior. Your expensive H100 runs might achieve target benchmark gains but could silently degrade the model's broader capabilities more than cheaper LoRA or QLoRA approaches. Prioritize comprehensive evaluation of general ability alongside task-specific metrics to avoid costly regressions.
Key insights
Expensive full fine-tuning can degrade a model's general ability more than cheaper parameter-efficient methods.
Principles
- Full fine-tuning learns more and forgets more.
- LoRA learns less and forgets less.
- LoRA and full fine-tuning yield structurally different models.
In practice
- Reproduce cheap fine-tuning on a single consumer GPU.
- Evaluate model general ability beyond target benchmarks.
Topics
- Fine-tuning
- LoRA
- QLoRA
- Model Degradation
- GPU Performance
- Parameter-Efficient Fine-Tuning
Best for: Machine Learning Engineer, AI Engineer, Director of AI/ML
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning on Medium.