Tracking Every Token: Granular Cost and Usage Metrics for Microsoft Foundry Agents

· Source: Microsoft Foundry Blog articles · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cloud Computing & IT Infrastructure, Data Science & Analytics · Depth: Intermediate, medium

Summary

Microsoft has released a solution for granular cost and usage tracking of AI agents deployed within Microsoft AI Foundry, addressing the challenge of understanding per-agent, per-model, and per-request expenses. This solution integrates Azure API Management (APIM) as an AI Gateway and Application Insights for telemetry storage and querying. APIM handles routing, rate limiting, authentication, and adds trace headers, while Application Insights receives token-level data via OpenTelemetry, populating `customMetrics` for cumulative counters and `traces` for detailed log entries. This architecture enables real-time cost attribution, allowing users to answer specific questions like an agent's average cost per request or prompt-to-completion token breakdown per model, without modifying the agents themselves. The solution is extensible, supporting any Foundry-hosted agent exposed through APIM with minimal configuration.

Key takeaway

For AI Architects and MLOps Engineers managing AI agents in Microsoft Foundry, implementing this APIM and Application Insights solution is crucial for gaining granular cost visibility. You can accurately attribute costs per agent and model, optimize prompt design, and make informed decisions on model selection. This approach provides real-time telemetry and KQL-driven insights without requiring modifications to your existing agent code, streamlining cost management and operational efficiency.

Key insights

Granular AI agent cost and usage tracking is achievable by integrating Azure API Management and Application Insights.

Principles

Method

Route AI agent requests through Azure API Management to stamp metadata. Capture token usage from responses and send to Application Insights via OpenTelemetry. Query `traces` and `customMetrics` tables using KQL for detailed cost analysis.

In practice

Topics

Code references

Best for: AI Engineer, MLOps Engineer, AI Architect

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Microsoft Foundry Blog articles.