Does AGENTS.md Actually Help Coding Agents?

· Source: AI Newsletter · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Advanced, medium

Summary

A new paper from ETH Zurich's SRI Lab, "Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents?", rigorously tests the effectiveness of repository-level context files (like CLAUDE.md or AGENTS.md) for coding agents. The study evaluated Claude Code (Sonnet-4.5), Codex (GPT-5.2 and GPT-5.1 mini), and Qwen Code (Qwen3-30b-coder) on hundreds of real GitHub issues using both the standard SWE-bench Lite and a new benchmark, AGENTbench, which comprises 138 instances from 12 less-popular Python repositories with existing developer-written context files. Key findings indicate that LLM-generated context files reduce task success rates by 0.5% on SWE-bench Lite and 2% on AGENTbench, while increasing inference cost by over 20%. Conversely, human-written context files improved success rates by 4% on average across both benchmarks, but still incurred 14-22% more reasoning tokens and 2-4 additional steps per task. The core difference lies in redundancy: LLM-generated files often duplicate existing documentation, whereas human-written files provide unique, non-obvious information.

Key takeaway

For AI Scientists and Machine Learning Engineers designing or implementing coding agents, you should critically evaluate the content of your context files. Prioritize human-written files that provide specific, non-redundant information about project quirks or non-default tooling. Avoid LLM-generated context files that merely rehash existing documentation, as these can decrease success rates and significantly increase inference costs without providing tangible benefits.

Key insights

Context files for coding agents are beneficial only when they provide unique, non-redundant information.

Principles

Method

The study introduced AGENTbench, a benchmark of 138 real-world Python repository instances with developer-written context files, to evaluate coding agents with and without context files.

In practice

Topics

Code references

Best for: Machine Learning Engineer, AI Scientist, Research Scientist, AI Engineer, Software Engineer, AI Researcher

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by AI Newsletter.