I Fine-Tuned One Model 3 Ways: The $50,000 Run Forgot More Than the $1,500 One

· Source: Machine Learning on Medium · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Advanced, quick

Summary

An analysis of fine-tuning an 8B model using full fine-tuning, LoRA, and QLoRA revealed a critical trade-off. The most expensive method, full fine-tuning on \$50,000 H100s, achieved a slight win on a target benchmark but significantly degraded the model's general capabilities. This degradation was more pronounced than with cheaper methods like LoRA, which was trained on a \$1,500 RTX 4090. The findings challenge the common perception that full fine-tuning is always the superior "gold standard," demonstrating that while it learns more, it also forgets more. Conversely, LoRA learns less but forgets less, suggesting that the two methods produce structurally different models even when target task performance appears similar.

Key takeaway

For AI engineers and teams managing significant GPU budgets for model fine-tuning, critically re-evaluate the assumption that full fine-tuning is inherently superior. Your expensive H100 runs might achieve target benchmark gains but could silently degrade the model's broader capabilities more than cheaper LoRA or QLoRA approaches. Prioritize comprehensive evaluation of general ability alongside task-specific metrics to avoid costly regressions.

Key insights

Expensive full fine-tuning can degrade a model's general ability more than cheaper parameter-efficient methods.

Principles

In practice

Topics

Best for: Machine Learning Engineer, AI Engineer, Director of AI/ML

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning on Medium.