Do AGENTS.md/CLAUDE.md Files Help Coding Agents? A New Paper Challenges this
Summary
A new paper from ETH Zurich and LogicStar.ai, "Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents?", rigorously tests the common practice of including AGENTS.md or CLAUDE.md files in coding repositories for AI agents. The study evaluated four coding agents—Claude Code (Sonnet-4.5), Codex (GPT-5.2 and GPT-5.1 Mini), and Qwen Code (Qwen3-30B)—across two benchmarks, SWE-Bench Lite and a new AGENTBENCH, under three conditions: no context file, LLM-generated context file, and human-written context file. Contrary to intuition, LLM-generated context files decreased task success by 2-3% and human-written files offered only a marginal 4% improvement. Both types of context files increased inference cost by over 20% and led to more steps to complete tasks. The research found that agents follow instructions from these files to a fault, increasing tool use and reasoning tokens without improving outcomes, suggesting that context files are primarily beneficial as a compensation mechanism for missing documentation.
Key takeaway
For AI Architects and MLOps Engineers optimizing coding agent performance and cost, re-evaluate the necessity of AGENTS.md or CLAUDE.md files. If your repository is already well-documented, skip auto-generated context files to avoid increased inference costs and reduced efficiency. Instead, consider a concise, human-written file addressing only non-obvious instructions, like specific tool usage (e.g., "use uv, not pip"), or generate one only for undocumented projects, as redundant information hinders agent performance.
Key insights
Context files for coding agents often increase cost and steps without improving task success in well-documented repositories.
Principles
- More context does not always equal better agent performance.
- Redundancy in documentation is costly for autonomous agents.
- Agents can be over-instructed, leading to inefficient behavior.
Method
The study evaluated four coding agents across two benchmarks (SWE-Bench Lite, AGENTBENCH) and three context conditions: no file, LLM-generated file, and human-written file, measuring task success, inference cost, and tool use.
In practice
- Avoid LLM-generated context files in well-documented repos.
- Generate context files for repositories with zero documentation.
- Manually trim generated context files for partially documented repos.
Topics
- Coding Agents
- Repository Context Files
- LLM Performance Evaluation
- Documentation Redundancy
- Inference Cost
Best for: AI Architect, MLOps Engineer, Machine Learning Engineer, AI Engineer, Software Engineer, AI Researcher
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by To Data & Beyond.