The Best Open 32B Model (Open Thoughts Agents)

2026-06-26 · Source: Discover AI · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Data Science & Analytics · Depth: Advanced, extended

Summary

Research from multiple universities and companies, including Stanford, Harvard, and Amazon, introduces a six-stage data curation pipeline designed to significantly improve open-source AI agent performance. This methodology, developed through over 100 ablation experiments, focuses on optimizing supervised fine-tuning (SFT) data for models like the Qwen 3 32B. Key stages involve strategic task sourcing from diverse platforms like SWE Smith and Stack Exchange, mixing complementary sources for broader generalization, and filtering tasks based on strong model response length. Crucially, the study identified GLM 4.7 quantized as a superior teacher model over GPT-5.3/5.5 for generating agent trajectories and emphasized keeping trajectories with at least five turns for richer multi-step behavior. The resulting Open Sinker agent 32B model demonstrates superior performance across financial, medical, and software engineering benchmarks. Additionally, the research explored synthetic data augmentation to scale task diversity from 1,000 to over 21,000 unique phrasings for larger datasets, and found mixed results regarding the general benefit of adding a reinforcement learning stage after SFT.

Key takeaway

For Machine Learning Engineers building custom AI agents, focus on meticulously curating your supervised fine-tuning data. Implement a multi-stage pipeline, prioritizing diverse task sourcing and filtering for longer, information-rich trajectories (at least five turns). Consider GLM 4.7 quantized as a teacher model for trajectory generation, as this approach has shown to significantly improve agent performance and generalization, potentially outperforming models fine-tuned with GPT-5.3/5.5-generated data.

Key insights

Optimizing supervised fine-tuning data through a structured pipeline significantly enhances open-source AI agent performance and generalization.

Principles

Data quality and diversity are paramount for agent performance.
Complementary data sources improve model generalization.
Longer agent trajectories encode richer multi-step behavior.

Method

A six-stage pipeline for SFT data curation: task sourcing, mixing, augmentation (found ineffective), filtering, teacher model selection (GLM 4.7 quantized), and trajectory filtering (min. five turns).

In practice

Combine SWE Smith, Stack Exchange for diverse task sourcing.
Filter tasks by strong model's longest response length.
Use GLM 4.7 quantized as a teacher model for trajectories.

Topics

Open-source AI Agents
Supervised Fine-Tuning
Data Curation Pipeline
Large Language Models
Agentic Benchmarks
Synthetic Data Augmentation

Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, AI Student

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Discover AI.