Open-SWE-Traces: Advancing Dual-Mode Multilingual Distillation for Software Engineering Agents
Summary
Open-SWE-Traces is a new, extensive dataset comprising 207,489 agentic trajectories designed to advance autonomous software engineering. Sourced from 20,000 real-world pull requests via OpenHands and SWE-agent, it covers nine programming languages including Python, Go, and Java. The dataset employs a dual-mode synthesis, using Minimax-M2.5 for "thinking" traces and Qwen3.5-122B for "non-thinking" traces, supporting a "switchable" reasoning framework. Filtered for permissive licenses (MIT, Apache, BSD) from SWE-rebench-V2, Open-SWE-Traces facilitates training models for long-horizon reasoning. Validation through fine-tuning the Qwen3-30B-A3B series yielded a model achieving 61.7% resolve on SWE-bench Verified, 57.1% on SWE-bench Multilingual, and 36.8% on SWE-bench Pro. This resource is crucial for distilling human-level software engineering capabilities into efficient, open-source agentic LLMs.
Key takeaway
For AI Engineers developing autonomous software engineering agents, you should prioritize training with diverse, dual-mode multilingual datasets like Open-SWE-Traces. This approach, incorporating both "thinking" and "non-thinking" trajectories, significantly improves issue resolution across multiple languages and complex tasks. Consider including unresolved task data, as it enhances model robustness. Your models will achieve higher resolve rates and better generalize to real-world scenarios.
Key insights
Dual-mode multilingual distillation with diverse agentic trajectories significantly improves LLM performance in software engineering tasks.
Principles
- Multilingual data boosts cross-lingual transfer.
- Including unresolved trajectories enhances learning.
- Thinking modes improve execution efficiency.
Method
Open-SWE-Traces construction involves repository selection (SWE-rebench v2, permissive licenses, 9 languages), dual-mode trajectory synthesis using MiniMax-M2.5 and Qwen3.5-122B in OpenHands/SWE-agent, and multi-stage quality filtering including AST-based "git hacking" detection.
In practice
- Train agents with both "thinking" and "non-thinking" traces.
- Incorporate multilingual data for broader task resolution.
- Include unresolved task trajectories in training data.
Topics
- Software Engineering Agents
- LLM Fine-tuning
- Multilingual Datasets
- Dual-Mode Reasoning
- SWE-bench Benchmarks
- Trajectory Distillation
Code references
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.