The Best Open 32B Model (Open Thoughts Agents)

· Source: Discover AI · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Data Science & Analytics · Depth: Advanced, extended

Summary

Research from multiple universities and companies, including Stanford, Harvard, and Amazon, introduces a six-stage data curation pipeline designed to significantly improve open-source AI agent performance. This methodology, developed through over 100 ablation experiments, focuses on optimizing supervised fine-tuning (SFT) data for models like the Qwen 3 32B. Key stages involve strategic task sourcing from diverse platforms like SWE Smith and Stack Exchange, mixing complementary sources for broader generalization, and filtering tasks based on strong model response length. Crucially, the study identified GLM 4.7 quantized as a superior teacher model over GPT-5.3/5.5 for generating agent trajectories and emphasized keeping trajectories with at least five turns for richer multi-step behavior. The resulting Open Sinker agent 32B model demonstrates superior performance across financial, medical, and software engineering benchmarks. Additionally, the research explored synthetic data augmentation to scale task diversity from 1,000 to over 21,000 unique phrasings for larger datasets, and found mixed results regarding the general benefit of adding a reinforcement learning stage after SFT.

Key takeaway

For Machine Learning Engineers building custom AI agents, focus on meticulously curating your supervised fine-tuning data. Implement a multi-stage pipeline, prioritizing diverse task sourcing and filtering for longer, information-rich trajectories (at least five turns). Consider GLM 4.7 quantized as a teacher model for trajectory generation, as this approach has shown to significantly improve agent performance and generalization, potentially outperforming models fine-tuned with GPT-5.3/5.5-generated data.

Key insights

Optimizing supervised fine-tuning data through a structured pipeline significantly enhances open-source AI agent performance and generalization.

Principles

Method

A six-stage pipeline for SFT data curation: task sourcing, mixing, augmentation (found ineffective), filtering, teacher model selection (GLM 4.7 quantized), and trajectory filtering (min. five turns).

In practice

Topics

Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, AI Student

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Discover AI.