Understanding post-training of LLMs: SFT
Summary
Post-training of Large Language Models (LLMs) focuses on refining a pre-trained model's behavior to act as a better assistant, rather than imparting new knowledge. This process primarily involves Supervised Fine-Tuning (SFT) and preference optimization techniques like RLHF or DPO. SFT entails continuing training with labeled prompt-response pairs, shifting the data distribution to guide the model's expression of existing knowledge. While SFT uses the same mathematical objective as pre-training, its application to small datasets can lead to overfitting, causing verbosity, reduced diversity, and increased hallucinations. To mitigate this, techniques like Low-Rank Adaptation (LoRA) are employed, which freeze most of the base model weights and only update a small, low-rank adapter matrix, significantly reducing trainable parameters and preventing catastrophic forgetting. LoRA is typically applied to attention layers (Q, K, V, O) for behavioral changes or MLP layers for domain shifts, allowing for targeted modifications without rewriting core knowledge.
Key takeaway
For AI Engineers optimizing LLM performance and behavior, understanding the targeted application of SFT and LoRA is crucial. You should consider LoRA for fine-tuning to prevent catastrophic forgetting and manage computational costs, especially when aiming for specific behavioral changes like instruction following or domain adaptation. Tailor LoRA application to attention layers for stylistic adjustments or MLP layers for deeper conceptual shifts, reserving full fine-tuning for fundamental model transformations.
Key insights
Post-training guides LLM behavior and expression, not knowledge, primarily via SFT and preference optimization.
Principles
- Post-training refines model behavior, not core knowledge.
- SFT shifts data distribution to guide model expression.
- LoRA enables efficient, stable fine-tuning by updating a low-rank subspace.
Method
SFT involves training a pre-trained model on labeled prompt-response examples, predicting only response tokens. LoRA freezes base weights and updates a low-rank adapter matrix (delta-W ~ AB) where r << d, reducing trainable parameters.
In practice
- Apply LoRA to attention layers for style/formatting changes.
- Modify MLP layers for domain adaptation or new abstraction patterns.
- Use full fine-tuning for extreme shifts like creating a coding model.
Topics
- Supervised Fine-tuning
- Low-Rank Adapters
- Large Language Models
- Post-training
- Attention Mechanisms
Best for: AI Engineer, Machine Learning Engineer, Deep Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by LLM on Medium.