Improve your agent’s tool-calling accuracy with SFT and DPO on Amazon SageMaker AI
Summary
AI agents can autonomously handle complex, multi-step tasks, but their effectiveness depends on calling the right tools to retrieve information or take action. This post demonstrates how to improve AI agent tool-calling accuracy by combining Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) on Amazon SageMaker AI. The process involves fine-tuning the Qwen3-1.7B model using NVIDIA's When2Call dataset, which includes 15,000 SFT samples and 9,000 DPO samples. Evaluation showed Qwen3-1.7B's accuracy increased from 41.57% (base) to 60.43% after SFT, and further to 71.06% after DPO, representing a 30% overall gain. This combined approach allowed the smaller Qwen3-1.7B model to outperform larger models like Llama 3.2 3B Instruct (62.67%) and Qwen3-0.6B (62.02%) in tool-calling accuracy.
Key takeaway
For AI Engineers focused on deploying reliable agentic applications, combining Supervised Fine-Tuning (SFT) with Direct Preference Optimization (DPO) on Amazon SageMaker AI is crucial. This approach significantly boosts tool-calling accuracy, as demonstrated by Qwen3-1.7B's 30% accuracy gain. You should consider this multi-step fine-tuning to achieve higher performance with smaller models, reducing inference costs and improving throughput in production environments. Evaluate your models using datasets like When2Call.
Key insights
The combination of Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) significantly enhances AI agent tool-calling accuracy.
Principles
- SFT establishes foundational understanding from explicit examples.
- DPO refines model outputs by incorporating direct preference feedback.
- Smaller models can achieve superior performance with targeted fine-tuning.
Method
The process involves curating a high-quality dataset (e.g., When2Call), applying SFT to a base model (e.g., Qwen3-1.7B) using a Spectrum-based recipe, and then further refining with DPO using preference data. This is executed via SageMaker AI training jobs.
In practice
- Use NVIDIA's When2Call dataset for tool-calling evaluation.
- Implement Hugging Face TRL's "SFTTrainer" and "DPOTrainer".
- Configure DPO "beta" hyperparameter between 0.1 and 0.5.
Topics
- AI Agents
- Tool Calling
- Supervised Fine-Tuning
- Direct Preference Optimization
- Amazon SageMaker AI
- Qwen3-1.7B
Code references
- aws-samples/amazon-sagemaker-generativeai
- aws-samples/amazon-sagemaker-generativeai
- aws-samples/amazon-sagemaker-generativeai.git`
- NVIDIA/When2Call
Best for: Machine Learning Engineer, AI Engineer, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.