Your Agent Loop Is Fine. Your Tools Are Why It Breaks.
Summary
This article argues that the common instability in AI agent systems stems not from the agent orchestration loop itself, which is often a stable "hundred lines" of code, but from poorly defined or implemented tools. The author identifies four key areas for improvement, drawing heavily on Anthropic's guidance and personal testing. These include writing tool descriptions like onboarding guides, recommending "at least three to four sentences per tool" to clarify function, parameters, and usage context. Another fix involves collapsing overlapping tools into fewer, sharper ones, often by adding action parameters or namespacing. Furthermore, tools should return high-signal outputs, not raw data dumps, with volume controls and human-meaningful identifiers. Finally, error messages must be instructional, guiding the model on how to correct its input rather than providing opaque stack traces. These changes, the author claims, are crucial for agent reliability.
Key takeaway
For AI Engineers building agentic systems, focus your debugging efforts on tool definitions rather than endlessly refactoring the agent loop. You should meticulously craft tool descriptions, consolidate redundant tools, ensure outputs are high-signal, and design error messages as actionable instructions. This approach will significantly improve agent reliability and reduce token consumption, directly addressing the primary source of agent flakiness in your applications.
Key insights
Agent reliability hinges on well-defined tools, not complex orchestration loops.
Principles
- Detailed tool descriptions are the most important factor.
- Overlapping tools distract agents and reduce efficiency.
- High-signal outputs improve model reasoning and reduce tokens.
Method
Write tool descriptions with 3-4 sentences covering usage, parameters, and returns. Consolidate overlapping tools using action parameters or namespacing. Return only high-signal fields, adding volume controls like pagination. Craft error messages as actionable instructions for the model.
In practice
- Test tool descriptions by asking for ambiguous data.
- Use `response_format` parameters for concise or detailed outputs.
- Paste agent transcripts into Claude Code for tool refactoring suggestions.
Topics
- Agentic AI
- Tool Use
- LLM Tooling
- Agent Reliability
- Prompt Engineering
- Anthropic Claude
Best for: AI Engineer, Machine Learning Engineer, Prompt Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence in Plain English - Medium.