Why (Senior) Engineers Struggle to Build AI Agents — Philipp Schmid, Google DeepMind
Summary
Philipp Schmid of Google DeepMind identifies five core reasons why engineers struggle to build AI agents compared to traditional software development. He highlights that agents treat "text as the new state," moving beyond rigid data structures to semantic meaning, and require developers to "hand over control" to non-deterministic LLMs rather than enforcing predefined workflows. Crucially, "errors are just inputs" for agents, necessitating recovery mechanisms instead of full restarts to preserve context and compute. The shift from "unit tests to evals" is essential, as agents' non-deterministic nature demands reliability measurement through success rates and subjective outcome evaluation. Finally, APIs must be "agent ready" with semantic interfaces, as agents lack human developers' implicit context, unlike traditional, static APIs.
Key takeaway
For AI Engineers building agent-based systems, you must fundamentally rethink traditional software development paradigms. Embrace non-deterministic outcomes by designing for recovery and treating errors as inputs, rather than restarting processes. Shift your testing from unit assertions to comprehensive evaluations that measure reliability and subjective success. Furthermore, ensure your tools and APIs are semantically rich and self-documenting for agent consumption, moving beyond assumptions of human context. This approach is crucial for building robust and adaptable AI agents.
Key insights
Building AI agents demands a paradigm shift from deterministic software, embracing non-determinism, semantic understanding, and continuous evaluation.
Principles
- Embrace non-determinism in agent design.
- Treat semantic meaning as the core state.
- Design for agent recovery, not full restarts.
Method
The agent development process involves defining instructions, running, observing, adjusting prompts/tools, and iteratively improving reliability, contrasting with traditional spec-code-test-deploy cycles.
In practice
- Implement LLM-as-a-judge for evals.
- Create self-documenting, semantic APIs.
- Provide errors back to the model as input.
Topics
- AI Agents
- LLM Development
- Software Engineering
- Agent Evaluation
- API Design
- Non-deterministic Systems
Best for: AI Engineer, Machine Learning Engineer, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by AI Engineer.