I Built an Azure AI Agent That Passed Every Test. Here’s Why I Still Added a Human Approval Step.

2026-06-25 · Source: Towards AI - Medium · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Cloud Computing & IT Infrastructure · Depth: Intermediate, long

Summary

An Azure AI agent designed for internal access requests, built using Azure OpenAI, Azure AI Search, and Azure Functions, consistently passed all functional, retrieval, tool, safety, and regression tests. Despite this, the author implemented a human approval step for any action that modifies system state. The architecture ensures the agent only prepares proposed actions, which are then routed to a human reviewer via Azure Logic Apps and Microsoft Teams. This approach prioritizes accountability and transparency, employing a risk-based model to determine automation levels for low, medium, and high-risk actions. The system logs all decisions and actions, advocating for a progressive increase in autonomy based on accumulated operational evidence rather than initial test success.

Key takeaway

For AI Engineers deploying agents that modify systems, recognize that passing tests does not equate to production trust. You must design accountability upfront by separating agent recommendations from execution. Implement a human approval step for high-risk actions, using a risk-based model to progressively earn autonomy. This approach ensures safety and auditability, allowing you to ship agents faster while deliberately scaling trust based on operational evidence.

Key insights

AI agent autonomy must be earned through operational evidence, not just test passes, with human approval as a critical control.

Principles

Accountability is a design-time decision, not post-deployment.
Production trust for AI agents is earned operationally.
Separate AI agent recommendation from system execution.

Method

An orchestrator uses Azure OpenAI for reasoning and Azure AI Search for retrieval. It routes proposed actions to a human approval queue via Azure Logic Apps, executing only after explicit human sign-off, ensuring a logged audit trail.

In practice

Implement permission-aware RAG for data security.
Version prompts to track behavior changes.
Validate agent output against a strict JSON contract.

Topics

Azure AI Agents
Human-in-the-Loop
Responsible AI
AI Testing
Risk-Based Automation
MLOps Architecture
Progressive Autonomy

Best for: AI Engineer, MLOps Engineer, AI Architect

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Towards AI - Medium.