Improve your agent’s tool-calling accuracy with SFT and DPO on Amazon SageMaker AI

2026-06-03 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cloud Computing & IT Infrastructure, Robotics & Autonomous Systems · Depth: Advanced, long

Summary

AI agents can autonomously handle complex, multi-step tasks, but their effectiveness depends on calling the right tools to retrieve information or take action. This post demonstrates how to improve AI agent tool-calling accuracy by combining Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) on Amazon SageMaker AI. The process involves fine-tuning the Qwen3-1.7B model using NVIDIA's When2Call dataset, which includes 15,000 SFT samples and 9,000 DPO samples. Evaluation showed Qwen3-1.7B's accuracy increased from 41.57% (base) to 60.43% after SFT, and further to 71.06% after DPO, representing a 30% overall gain. This combined approach allowed the smaller Qwen3-1.7B model to outperform larger models like Llama 3.2 3B Instruct (62.67%) and Qwen3-0.6B (62.02%) in tool-calling accuracy.

Key takeaway

For AI Engineers focused on deploying reliable agentic applications, combining Supervised Fine-Tuning (SFT) with Direct Preference Optimization (DPO) on Amazon SageMaker AI is crucial. This approach significantly boosts tool-calling accuracy, as demonstrated by Qwen3-1.7B's 30% accuracy gain. You should consider this multi-step fine-tuning to achieve higher performance with smaller models, reducing inference costs and improving throughput in production environments. Evaluate your models using datasets like When2Call.

Key insights

The combination of Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) significantly enhances AI agent tool-calling accuracy.

Principles

SFT establishes foundational understanding from explicit examples.
DPO refines model outputs by incorporating direct preference feedback.
Smaller models can achieve superior performance with targeted fine-tuning.

Method

The process involves curating a high-quality dataset (e.g., When2Call), applying SFT to a base model (e.g., Qwen3-1.7B) using a Spectrum-based recipe, and then further refining with DPO using preference data. This is executed via SageMaker AI training jobs.

In practice

Use NVIDIA's When2Call dataset for tool-calling evaluation.
Implement Hugging Face TRL's "SFTTrainer" and "DPOTrainer".
Configure DPO "beta" hyperparameter between 0.1 and 0.5.

Topics

AI Agents
Tool Calling
Supervised Fine-Tuning
Direct Preference Optimization
Amazon SageMaker AI
Qwen3-1.7B

Code references

Best for: Machine Learning Engineer, AI Engineer, MLOps Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.