The art and science of hyperparameter optimization on Amazon Nova Forge

2026-06-02 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cloud Computing & IT Infrastructure · Depth: Advanced, extended

Summary

Amazon Nova Forge enables building custom frontier models from Amazon Nova by blending proprietary data with curated training data, addressing LLM struggles with specialized tasks. This post details hyperparameter optimization on Nova Forge, covering the "art" of strategic trade-offs and the "science" of metric-driven decisions to avoid expensive failed training runs. It outlines three customization techniques: Continued Pre-training (CPT), Supervised Fine-tuning (SFT), and Reinforcement Fine-tuning (RFT), which can be used sequentially for strongest results. Key strategic decisions include checkpoint selection (pre-trained, mid-trained, post-trained), data mixing (balancing customer and Nova data, especially "reasoning-instruction-following" for SFT), and training mode (LoRA for cost-efficiency, Full Rank for maximum adaptation). The article provides guidance on learning rate, batch size, and RFT-specific parameters, noting experiments achieved a 10.75% F1 score improvement on MedReason and 322% on LLaVA-CoT. It also identifies common pitfalls like catastrophic forgetting and incorrect learning rate usage.

Key takeaway

For AI Engineers customizing LLMs on Amazon Nova Forge, prioritize data and reward function quality before hyperparameter tuning. Start with service defaults for learning rate and data mixing to ensure stability and prevent catastrophic forgetting. If using Continued Pre-training, carefully select checkpoints based on data scale. Validate your pipeline with LoRA before considering Full Rank training for production. Always monitor validation loss to avoid overfitting and ensure general capabilities are preserved.

Key insights

Customizing LLMs on Amazon Nova Forge requires balancing strategic choices and systematic hyperparameter tuning to prevent catastrophic forgetting.

Principles

Data and reward quality are paramount.
Data mixing prevents catastrophic forgetting.
Checkpoint selection is most impactful for CPT.

Method

The Nova Forge pipeline involves CPT (domain knowledge), SFT (task-specific behavior), and RFT (reward-based optimization), ideally in sequence, with optional stages based on data and task.

In practice

Start with LoRA to validate pipelines.
Use service defaults for learning rate.
Include "reasoning-instruction-following" in SFT data mix.

Topics

Hyperparameter Optimization
Amazon Nova Forge
LLM Customization
Data Mixing
LoRA Fine-tuning
Catastrophic Forgetting

Code references

aws/sagemaker-hyperpod-recipes

Best for: Machine Learning Engineer, AI Engineer, AI Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.