Tracking Every Token: Granular Cost and Usage Metrics for Microsoft Foundry Agents
Summary
Microsoft has released a solution for granular cost and usage tracking of AI agents deployed within Microsoft AI Foundry, addressing the challenge of understanding per-agent, per-model, and per-request expenses. This solution integrates Azure API Management (APIM) as an AI Gateway and Application Insights for telemetry storage and querying. APIM handles routing, rate limiting, authentication, and adds trace headers, while Application Insights receives token-level data via OpenTelemetry, populating `customMetrics` for cumulative counters and `traces` for detailed log entries. This architecture enables real-time cost attribution, allowing users to answer specific questions like an agent's average cost per request or prompt-to-completion token breakdown per model, without modifying the agents themselves. The solution is extensible, supporting any Foundry-hosted agent exposed through APIM with minimal configuration.
Key takeaway
For AI Architects and MLOps Engineers managing AI agents in Microsoft Foundry, implementing this APIM and Application Insights solution is crucial for gaining granular cost visibility. You can accurately attribute costs per agent and model, optimize prompt design, and make informed decisions on model selection. This approach provides real-time telemetry and KQL-driven insights without requiring modifications to your existing agent code, streamlining cost management and operational efficiency.
Key insights
Granular AI agent cost and usage tracking is achievable by integrating Azure API Management and Application Insights.
Principles
- Centralize AI gateway functions via APIM.
- Capture token-level telemetry via OpenTelemetry.
- Enable custom cost analysis with KQL queries.
Method
Route AI agent requests through Azure API Management to stamp metadata. Capture token usage from responses and send to Application Insights via OpenTelemetry. Query `traces` and `customMetrics` tables using KQL for detailed cost analysis.
In practice
- Use APIM for AI agent routing and rate limiting.
- Implement OpenTelemetry for real-time cost telemetry.
- Build custom KQL dashboards for agent cost visibility.
Topics
- Microsoft AI Foundry
- Azure API Management
- Application Insights
- AI Agent Cost Attribution
- Token-level Telemetry
Code references
Best for: AI Engineer, MLOps Engineer, AI Architect
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Microsoft Foundry Blog articles.