The art and science of hyperparameter optimization on Amazon Nova Forge
Summary
Amazon Nova Forge enables building custom frontier models from Amazon Nova by blending proprietary data with curated training data, addressing LLM struggles with specialized tasks. This post details hyperparameter optimization on Nova Forge, covering the "art" of strategic trade-offs and the "science" of metric-driven decisions to avoid expensive failed training runs. It outlines three customization techniques: Continued Pre-training (CPT), Supervised Fine-tuning (SFT), and Reinforcement Fine-tuning (RFT), which can be used sequentially for strongest results. Key strategic decisions include checkpoint selection (pre-trained, mid-trained, post-trained), data mixing (balancing customer and Nova data, especially "reasoning-instruction-following" for SFT), and training mode (LoRA for cost-efficiency, Full Rank for maximum adaptation). The article provides guidance on learning rate, batch size, and RFT-specific parameters, noting experiments achieved a 10.75% F1 score improvement on MedReason and 322% on LLaVA-CoT. It also identifies common pitfalls like catastrophic forgetting and incorrect learning rate usage.
Key takeaway
For AI Engineers customizing LLMs on Amazon Nova Forge, prioritize data and reward function quality before hyperparameter tuning. Start with service defaults for learning rate and data mixing to ensure stability and prevent catastrophic forgetting. If using Continued Pre-training, carefully select checkpoints based on data scale. Validate your pipeline with LoRA before considering Full Rank training for production. Always monitor validation loss to avoid overfitting and ensure general capabilities are preserved.
Key insights
Customizing LLMs on Amazon Nova Forge requires balancing strategic choices and systematic hyperparameter tuning to prevent catastrophic forgetting.
Principles
- Data and reward quality are paramount.
- Data mixing prevents catastrophic forgetting.
- Checkpoint selection is most impactful for CPT.
Method
The Nova Forge pipeline involves CPT (domain knowledge), SFT (task-specific behavior), and RFT (reward-based optimization), ideally in sequence, with optional stages based on data and task.
In practice
- Start with LoRA to validate pipelines.
- Use service defaults for learning rate.
- Include "reasoning-instruction-following" in SFT data mix.
Topics
- Hyperparameter Optimization
- Amazon Nova Forge
- LLM Customization
- Data Mixing
- LoRA Fine-tuning
- Catastrophic Forgetting
Code references
Best for: Machine Learning Engineer, AI Engineer, AI Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.