Why (Senior) Engineers Struggle to Build AI Agents — Philipp Schmid, Google DeepMind

2026-05-30 · Source: AI Engineer · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Intermediate, medium

Summary

Philipp Schmid of Google DeepMind identifies five core reasons why engineers struggle to build AI agents compared to traditional software development. He highlights that agents treat "text as the new state," moving beyond rigid data structures to semantic meaning, and require developers to "hand over control" to non-deterministic LLMs rather than enforcing predefined workflows. Crucially, "errors are just inputs" for agents, necessitating recovery mechanisms instead of full restarts to preserve context and compute. The shift from "unit tests to evals" is essential, as agents' non-deterministic nature demands reliability measurement through success rates and subjective outcome evaluation. Finally, APIs must be "agent ready" with semantic interfaces, as agents lack human developers' implicit context, unlike traditional, static APIs.

Key takeaway

For AI Engineers building agent-based systems, you must fundamentally rethink traditional software development paradigms. Embrace non-deterministic outcomes by designing for recovery and treating errors as inputs, rather than restarting processes. Shift your testing from unit assertions to comprehensive evaluations that measure reliability and subjective success. Furthermore, ensure your tools and APIs are semantically rich and self-documenting for agent consumption, moving beyond assumptions of human context. This approach is crucial for building robust and adaptable AI agents.

Key insights

Building AI agents demands a paradigm shift from deterministic software, embracing non-determinism, semantic understanding, and continuous evaluation.

Principles

Embrace non-determinism in agent design.
Treat semantic meaning as the core state.
Design for agent recovery, not full restarts.

Method

The agent development process involves defining instructions, running, observing, adjusting prompts/tools, and iteratively improving reliability, contrasting with traditional spec-code-test-deploy cycles.

In practice

Implement LLM-as-a-judge for evals.
Create self-documenting, semantic APIs.
Provide errors back to the model as input.

Topics

AI Agents
LLM Development
Software Engineering
Agent Evaluation
API Design
Non-deterministic Systems

Best for: AI Engineer, Machine Learning Engineer, MLOps Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by AI Engineer.