Generally Available: Evaluations, Monitoring, and Tracing in Microsoft Foundry
Summary
Microsoft Foundry has made its Evaluations, Monitoring, and Tracing capabilities generally available through the Foundry Control Plane, deeply integrating AI agent observability with Azure Monitor. This release addresses the challenge of maintaining AI agent quality in production, where factors like foundation model updates, prompt changes, retrieval pipeline drift, and real-world traffic can degrade performance. Foundry's approach emphasizes continuous evaluation across the entire AI lifecycle, from local development to CI/CD and live production monitoring. It offers built-in evaluators for critical dimensions like Coherence, Relevance, Groundedness, Retrieval Quality, and Safety, alongside support for custom LLM-as-a-Judge and code-based evaluators. All observability data is published to Azure Monitor, enabling cross-stack correlation, unified alerting, and enterprise governance, with evaluation results directly linked to OpenTelemetry-based traces for efficient root cause analysis. Additionally, a Prompt Optimizer (public preview) helps systematically improve prompt engineering.
Key takeaway
For CTOs and VPs of Engineering deploying AI agents, relying solely on pre-deployment evaluations is insufficient for sustained production quality. You should adopt continuous evaluation and integrated observability solutions like Microsoft Foundry's to monitor agent performance against live traffic, correlate AI-specific metrics with broader infrastructure telemetry, and rapidly diagnose issues. This approach ensures your AI investments remain robust and aligned with operational standards, mitigating risks from model drift and unexpected edge cases.
Key insights
Continuous evaluation and integrated observability are crucial for maintaining AI agent quality in dynamic production environments.
Principles
- Evaluation must be continuous, not episodic.
- AI observability should integrate with existing infrastructure monitoring.
- Link evaluation results directly to traces for root cause analysis.
Method
Foundry's method involves continuous evaluation using built-in and custom evaluators, publishing all observability data to Azure Monitor for unified alerting and correlation, and linking evaluation results to OpenTelemetry-based traces for detailed diagnostics.
In practice
- Use built-in evaluators for core quality and safety checks.
- Implement custom evaluators for domain-specific criteria.
- Configure Azure Monitor alerts for AI quality degradation.
Topics
- AI Agent Observability
- Continuous Evaluation
- Retrieval-Augmented Generation
- Prompt Engineering
- Azure Monitor
Best for: CTO, VP of Engineering/Data, Director of AI/ML, MLOps Engineer, AI Operations Specialist, AI Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Microsoft Foundry Blog articles.