The Best Open 32B Model (Open Thoughts Agents)
Summary
Research from multiple universities and companies, including Stanford, Harvard, and Amazon, introduces a six-stage data curation pipeline designed to significantly improve open-source AI agent performance. This methodology, developed through over 100 ablation experiments, focuses on optimizing supervised fine-tuning (SFT) data for models like the Qwen 3 32B. Key stages involve strategic task sourcing from diverse platforms like SWE Smith and Stack Exchange, mixing complementary sources for broader generalization, and filtering tasks based on strong model response length. Crucially, the study identified GLM 4.7 quantized as a superior teacher model over GPT-5.3/5.5 for generating agent trajectories and emphasized keeping trajectories with at least five turns for richer multi-step behavior. The resulting Open Sinker agent 32B model demonstrates superior performance across financial, medical, and software engineering benchmarks. Additionally, the research explored synthetic data augmentation to scale task diversity from 1,000 to over 21,000 unique phrasings for larger datasets, and found mixed results regarding the general benefit of adding a reinforcement learning stage after SFT.
Key takeaway
For Machine Learning Engineers building custom AI agents, focus on meticulously curating your supervised fine-tuning data. Implement a multi-stage pipeline, prioritizing diverse task sourcing and filtering for longer, information-rich trajectories (at least five turns). Consider GLM 4.7 quantized as a teacher model for trajectory generation, as this approach has shown to significantly improve agent performance and generalization, potentially outperforming models fine-tuned with GPT-5.3/5.5-generated data.
Key insights
Optimizing supervised fine-tuning data through a structured pipeline significantly enhances open-source AI agent performance and generalization.
Principles
- Data quality and diversity are paramount for agent performance.
- Complementary data sources improve model generalization.
- Longer agent trajectories encode richer multi-step behavior.
Method
A six-stage pipeline for SFT data curation: task sourcing, mixing, augmentation (found ineffective), filtering, teacher model selection (GLM 4.7 quantized), and trajectory filtering (min. five turns).
In practice
- Combine SWE Smith, Stack Exchange for diverse task sourcing.
- Filter tasks by strong model's longest response length.
- Use GLM 4.7 quantized as a teacher model for trajectories.
Topics
- Open-source AI Agents
- Supervised Fine-Tuning
- Data Curation Pipeline
- Large Language Models
- Agentic Benchmarks
- Synthetic Data Augmentation
Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, AI Student
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Discover AI.