DeNovoSWE: Scaling Long-Horizon Environments for Generating Entire Repositories from Scratch

2026-06-17 · Source: cs.SE updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Expert, extended

Summary

DeNovoSWE is a new large-scale dataset designed for training LLM-based code agents to generate complete software repositories from high-level documentation. Comprising 4,818 high-quality instances, DeNovoSWE is automatically constructed via a sandboxed agentic workflow that employs a "divide and conquer" strategy, an iterative critic-repair mechanism, and a difficulty-aware trajectory filtering method to balance data quality and diversity. This dataset addresses the critical scarcity of verifiable long-horizon software engineering training data. Empirical results show that fine-tuning Qwen3-30B-A3B on DeNovoSWE substantially improved its performance on the BeyondSWE-Doc2Repo benchmark from 5.8% to 47.2%. Similarly, Qwen3.5-35B-A3B saw gains from 43.8% to 50.0% on BeyondSWE-Doc2Repo and from 23.5% to 27.1% on NL2RepoBench, demonstrating its effectiveness in enhancing whole-repository generation capabilities.

Key takeaway

For AI Engineers developing LLM-based code agents for complex software engineering, you should consider DeNovoSWE's approach to data generation. Its automated, sandboxed pipeline and difficulty-aware filtering strategy provide a scalable method for creating high-quality, long-horizon training data. This can significantly improve your agents' ability to generate entire repositories from documentation, as demonstrated by substantial performance gains on benchmarks like BeyondSWE-Doc2Repo.

Key insights

Automated, structured data generation with difficulty-aware filtering scales long-horizon software engineering training for LLM agents.

Principles

"Divide and conquer" simplifies complex tasks.
Iterative critic-repair refines generated content.
Difficulty-aware filtering balances quality and diversity.

Method

DeNovoSWE uses a sandboxed multi-agent system with a "divide" phase for capability decomposition and profiling, and a "conquer" phase for iterative draft-critic-repair documentation generation. It includes strict leakage prevention.

In practice

Fine-tune LLMs for whole-repository generation.
Use sandboxed agents for data curation.
Implement dynamic filtering for varied task difficulty.

Topics

LLM Code Agents
Software Engineering
Dataset Generation
Whole-Repository Generation
Long-Horizon Tasks
Difficulty-Aware Filtering
Supervised Fine-Tuning

Code references

AweAI-Team/DeNovoSWE

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.