Dockerless: Environment-Free Program Verifier for Coding Agents

2026-06-30 · Source: cs.SE updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Expert, medium

Summary

Dockerless is an environment-free program verifier designed for coding agents, eliminating the need for costly per-repository Docker environments. It evaluates generated code patches without execution by actively exploring the codebase using agentic repository exploration. This approach allows Dockerless to gather evidence and judge patch correctness, rather than relying on surface-level comparisons. On a verifier evaluation benchmark, Dockerless surpasses the strongest open-source verifier by 14.3 AUC points. When integrated into a fully environment-free post-training pipeline for SFT trajectory filtering and RL rewards, the resulting model achieves 62.0%, 50.0%, and 35.2% resolve rates on SWE-bench Verified, Multilingual, and Pro, respectively. This performance improves over the Qwen3.5-9B baseline by 2.4, 8.7, and 2.9 points, matching environment-based post-training.

Key takeaway

For ML Engineers developing or deploying automated coding agents, Dockerless offers a scalable solution to the costly environment setup bottleneck. If your team struggles with reproducible environments or comprehensive test suites for verification, adopting Dockerless's environment-free approach can significantly reduce overhead. This enables efficient post-training pipelines for SFT and RL, matching traditional environment-based performance and expanding agent applicability to a wider range of real-world repositories.

Key insights

Dockerless is an environment-free agentic verifier that explores codebases to judge patch correctness, enabling scalable coding agent training.

Principles

Execution-based verification is costly and often infeasible.
Deep repository context is crucial for complex SWE tasks.
Agentic exploration can replace environment-based testing.

Method

Dockerless generates verification questions from an issue and reference patch, dispatches parallel sub-agents to gather evidence from the repository, then aggregates evidence for a binary correctness verdict.

In practice

Filter SFT trajectories without Docker environments.
Provide RL rewards for coding agents efficiently.
Verify patches in legacy or private codebases.

Topics

Program Verification
Coding Agents
Environment-Free Verification
Reinforcement Learning
Supervised Fine-Tuning
SWE-bench

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.