Why (Senior) Engineers Struggle to Build AI Agents — Philipp Schmid, Google DeepMind

· Source: AI Engineer · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Intermediate, medium

Summary

Philipp Schmid of Google DeepMind identifies five core reasons why engineers struggle to build AI agents compared to traditional software development. He highlights that agents treat "text as the new state," moving beyond rigid data structures to semantic meaning, and require developers to "hand over control" to non-deterministic LLMs rather than enforcing predefined workflows. Crucially, "errors are just inputs" for agents, necessitating recovery mechanisms instead of full restarts to preserve context and compute. The shift from "unit tests to evals" is essential, as agents' non-deterministic nature demands reliability measurement through success rates and subjective outcome evaluation. Finally, APIs must be "agent ready" with semantic interfaces, as agents lack human developers' implicit context, unlike traditional, static APIs.

Key takeaway

For AI Engineers building agent-based systems, you must fundamentally rethink traditional software development paradigms. Embrace non-deterministic outcomes by designing for recovery and treating errors as inputs, rather than restarting processes. Shift your testing from unit assertions to comprehensive evaluations that measure reliability and subjective success. Furthermore, ensure your tools and APIs are semantically rich and self-documenting for agent consumption, moving beyond assumptions of human context. This approach is crucial for building robust and adaptable AI agents.

Key insights

Building AI agents demands a paradigm shift from deterministic software, embracing non-determinism, semantic understanding, and continuous evaluation.

Principles

Method

The agent development process involves defining instructions, running, observing, adjusting prompts/tools, and iteratively improving reliability, contrasting with traditional spec-code-test-deploy cycles.

In practice

Topics

Best for: AI Engineer, Machine Learning Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by AI Engineer.