Open-SWE-Traces: Advancing Dual-Mode Multilingual Distillation for Software Engineering Agents
Summary
Open-SWE-Traces is a new, expansive dataset comprising 207,489 agentic trajectories designed to advance autonomous software engineering. Sourced from 20,000 real-world pull requests via OpenHands and SWE-agent harnesses, this dataset supports nine programming languages, including Python, Go, and Java. It employs a hybrid-reasoning synthesis, utilizing Minimax-M2.5 for explicit "thinking" processes and Qwen3.5-122B for high-quality "non-thinking" traces. Filtered for permissive licenses like MIT and Apache from SWE-rebench-V2, Open-SWE-Traces facilitates training models for long-horizon reasoning. Validation involved fine-tuning the Qwen3-30B-A3B series, with the best model achieving resolve rates of 61.7% on SWE-bench Verified, 57.1% on SWE-bench Multilingual, and 36.8% on SWE-bench Pro, establishing it as a key resource for open-source agentic LLMs.
Key takeaway
For AI Engineers developing autonomous software agents, Open-SWE-Traces offers a critical resource to overcome data bottlenecks. You should consider fine-tuning models like the Qwen3-30B-A3B series with this dataset to achieve higher resolve rates on benchmarks like SWE-bench Verified. This dataset enables training for long-horizon reasoning across nine programming languages, directly improving your agent's real-world problem-solving capabilities.
Key insights
Open-SWE-Traces provides a large, dual-mode multilingual dataset to train agentic LLMs for autonomous software engineering.
Principles
- Hybrid reasoning synthesis improves trajectory data quality.
- Diverse, large-scale data is crucial for agentic LLMs.
- Permissive licensing enables open-source development.
Method
The dataset is synthesized using Minimax-M2.5 for "thinking" traces and Qwen3.5-122B for "non-thinking" traces, sourced from 20,000 real-world PRs and filtered from SWE-rebench-V2.
In practice
- Fine-tune Qwen3-30B-A3B for agentic tasks.
- Develop LLMs for long-horizon software reasoning.
- Benchmark agent performance on SWE-bench variants.
Topics
- Open-SWE-Traces
- Software Engineering Agents
- Multilingual LLMs
- Agentic Trajectories
- Hybrid Reasoning
- SWE-bench Benchmarking
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.