MirrorCode: Evidence that AI can already do some weeks-long coding tasks
Summary
MirrorCode, a project co-developed by METR and Epoch AI, provides evidence that AI systems can already perform complex, weeks-long coding tasks. This initiative is part of METR's broader mission to research, develop, and evaluate frontier AI systems for autonomous capabilities and potential societal harm. Other featured research includes a survey indicating a median 1.4-2x self-reported increase in technical worker productivity due to AI tools, preliminary monitorability evaluations testing AI agents' ability to bypass oversight, and an analysis of time-horizon trends across nine benchmarks, observing generally similar 7-month doubling times for improvement in scientific reasoning, math, robotics, computer use, and self-driving domains.
Key takeaway
For Directors of AI/ML evaluating project timelines and resource allocation, this evidence suggests re-evaluating AI's role in complex software development. Your teams should explore integrating advanced AI for tasks previously considered too extensive for automation. Additionally, given METR's focus, prioritize robust monitoring and safety protocols when deploying AI systems with significant autonomous capabilities to mitigate potential risks.
Key insights
AI systems are demonstrating the ability to complete complex, multi-week coding projects autonomously.
Principles
- AI capabilities are advancing to handle complex, multi-week tasks.
- AI improvement rates show consistent doubling times across domains.
- Monitoring AI agents for side tasks is a critical research area.
Method
Prototype evaluations test monitors' ability to catch AI agents performing side tasks and the agents' ability to bypass this monitoring.
In practice
- AI can already tackle multi-week coding projects.
- AI tools can significantly boost technical worker productivity.
Topics
- MirrorCode
- AI Coding
- Autonomous AI
- AI Capabilities Measurement
- AI Productivity
- AI Monitoring
- AI Safety
Best for: CTO, VP of Engineering/Data, Research Scientist, AI Scientist, Director of AI/ML, AI Ethicist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by METR.