AI Observability Starter Kit for Microsoft Foundry agents
Summary
The AI Observability Starter Kit for Microsoft Foundry agents provides a comprehensive solution for monitoring AI agent performance and safety in production. This kit, provisioned via a single PowerShell command, deploys a four-agent Microsoft Foundry environment featuring instrumented OpenTelemetry traces, eight built-in evaluators, a custom compliance evaluator, an automated red-team scan, and two scheduled-query alerts. It includes agents targeting gpt-4o-mini, gpt-5-mini, gpt-4.1-mini, and a broken model to demonstrate error handling and comparative metrics. Telemetry flows into Azure Application Insights, powering custom Grafana dashboards for operational insights (e.g., token usage, latency, error rates) and enabling batch evaluation of agent quality and adversarial testing. The entire setup takes 35-50 minutes to deploy and costs approximately \$0.03/day to run.
Key takeaway
For ML engineers and SREs deploying AI agents on Azure, if you need production-grade observability without extensive manual setup, this starter kit offers a validated, automated baseline. You can deploy a comprehensive monitoring environment, including instrumented traces, quality evaluators, red-team testing, and alerts, with a single command. This allows you to quickly establish robust agent monitoring, identify issues like model errors or safety failures, and ensure compliance, significantly reducing your initial setup effort.
Key insights
Production-grade AI observability requires instrumented traces, automated quality and safety evaluations, and proactive alerting.
Principles
- AI agent observability extends beyond basic HTTP status.
- OpenTelemetry spans form the core of AI agent telemetry.
- Combine built-in and custom evaluators for agent quality.
Method
The kit's method involves a single PowerShell script to provision Azure infrastructure, deploy Foundry agents, generate traffic, run batch evaluations (built-in and custom), execute automated red-team scans, and configure Grafana dashboards and scheduled alerts.
In practice
- Use "ENABLE_INSTRUMENTATION=true" for OTel child spans.
- Set "ENABLE_SENSITIVE_DATA=true" to capture prompts for evaluators.
- Implement custom code-based evaluators for domain-specific policies.
Topics
- AI Observability
- Microsoft Foundry
- OpenTelemetry
- AI Agent Evaluation
- Red Teaming
- Azure Application Insights
- Grafana Dashboards
Code references
Best for: Software Engineer, Machine Learning Engineer, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Microsoft Foundry Blog articles.