MirrorCode: Evidence that AI can already do some weeks-long coding tasks

2026-04-10 · Source: METR · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Emerging Technologies & Innovation · Depth: Expert, quick

Summary

MirrorCode, a project co-developed by METR and Epoch AI, provides evidence that AI systems can already perform complex, weeks-long coding tasks. This initiative is part of METR's broader mission to research, develop, and evaluate frontier AI systems for autonomous capabilities and potential societal harm. Other featured research includes a survey indicating a median 1.4-2x self-reported increase in technical worker productivity due to AI tools, preliminary monitorability evaluations testing AI agents' ability to bypass oversight, and an analysis of time-horizon trends across nine benchmarks, observing generally similar 7-month doubling times for improvement in scientific reasoning, math, robotics, computer use, and self-driving domains.

Key takeaway

For Directors of AI/ML evaluating project timelines and resource allocation, this evidence suggests re-evaluating AI's role in complex software development. Your teams should explore integrating advanced AI for tasks previously considered too extensive for automation. Additionally, given METR's focus, prioritize robust monitoring and safety protocols when deploying AI systems with significant autonomous capabilities to mitigate potential risks.

Key insights

AI systems are demonstrating the ability to complete complex, multi-week coding projects autonomously.

Principles

AI capabilities are advancing to handle complex, multi-week tasks.
AI improvement rates show consistent doubling times across domains.
Monitoring AI agents for side tasks is a critical research area.

Method

Prototype evaluations test monitors' ability to catch AI agents performing side tasks and the agents' ability to bypass this monitoring.

In practice

AI can already tackle multi-week coding projects.
AI tools can significantly boost technical worker productivity.

Topics

MirrorCode
AI Coding
Autonomous AI
AI Capabilities Measurement
AI Productivity
AI Monitoring
AI Safety

Best for: CTO, VP of Engineering/Data, Research Scientist, AI Scientist, Director of AI/ML, AI Ethicist

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by METR.