Open-SWE-Traces: Advancing Dual-Mode Multilingual Distillation for Software Engineering Agents

2026-06-14 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Expert, quick

Summary

Open-SWE-Traces is a new, expansive dataset comprising 207,489 agentic trajectories designed to advance autonomous software engineering. Sourced from 20,000 real-world pull requests via OpenHands and SWE-agent harnesses, this dataset supports nine programming languages, including Python, Go, and Java. It employs a hybrid-reasoning synthesis, utilizing Minimax-M2.5 for explicit "thinking" processes and Qwen3.5-122B for high-quality "non-thinking" traces. Filtered for permissive licenses like MIT and Apache from SWE-rebench-V2, Open-SWE-Traces facilitates training models for long-horizon reasoning. Validation involved fine-tuning the Qwen3-30B-A3B series, with the best model achieving resolve rates of 61.7% on SWE-bench Verified, 57.1% on SWE-bench Multilingual, and 36.8% on SWE-bench Pro, establishing it as a key resource for open-source agentic LLMs.

Key takeaway

For AI Engineers developing autonomous software agents, Open-SWE-Traces offers a critical resource to overcome data bottlenecks. You should consider fine-tuning models like the Qwen3-30B-A3B series with this dataset to achieve higher resolve rates on benchmarks like SWE-bench Verified. This dataset enables training for long-horizon reasoning across nine programming languages, directly improving your agent's real-world problem-solving capabilities.

Key insights

Open-SWE-Traces provides a large, dual-mode multilingual dataset to train agentic LLMs for autonomous software engineering.

Principles

Hybrid reasoning synthesis improves trajectory data quality.
Diverse, large-scale data is crucial for agentic LLMs.
Permissive licensing enables open-source development.

Method

The dataset is synthesized using Minimax-M2.5 for "thinking" traces and Qwen3.5-122B for "non-thinking" traces, sourced from 20,000 real-world PRs and filtered from SWE-rebench-V2.

In practice

Fine-tune Qwen3-30B-A3B for agentic tasks.
Develop LLMs for long-horizon software reasoning.
Benchmark agent performance on SWE-bench variants.

Topics

Open-SWE-Traces
Software Engineering Agents
Multilingual LLMs
Agentic Trajectories
Hybrid Reasoning
SWE-bench Benchmarking

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.