Open-SWE-Traces: Advancing Dual-Mode Multilingual Distillation for Software Engineering Agents

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Expert, quick

Summary

Open-SWE-Traces is a new, expansive dataset comprising 207,489 agentic trajectories designed to advance autonomous software engineering. Sourced from 20,000 real-world pull requests via OpenHands and SWE-agent harnesses, this dataset supports nine programming languages, including Python, Go, and Java. It employs a hybrid-reasoning synthesis, utilizing Minimax-M2.5 for explicit "thinking" processes and Qwen3.5-122B for high-quality "non-thinking" traces. Filtered for permissive licenses like MIT and Apache from SWE-rebench-V2, Open-SWE-Traces facilitates training models for long-horizon reasoning. Validation involved fine-tuning the Qwen3-30B-A3B series, with the best model achieving resolve rates of 61.7% on SWE-bench Verified, 57.1% on SWE-bench Multilingual, and 36.8% on SWE-bench Pro, establishing it as a key resource for open-source agentic LLMs.

Key takeaway

For AI Engineers developing autonomous software agents, Open-SWE-Traces offers a critical resource to overcome data bottlenecks. You should consider fine-tuning models like the Qwen3-30B-A3B series with this dataset to achieve higher resolve rates on benchmarks like SWE-bench Verified. This dataset enables training for long-horizon reasoning across nine programming languages, directly improving your agent's real-world problem-solving capabilities.

Key insights

Open-SWE-Traces provides a large, dual-mode multilingual dataset to train agentic LLMs for autonomous software engineering.

Principles

Method

The dataset is synthesized using Minimax-M2.5 for "thinking" traces and Qwen3.5-122B for "non-thinking" traces, sourced from 20,000 real-world PRs and filtered from SWE-rebench-V2.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.