Tracking Every Token: Granular Cost and Usage Metrics for Microsoft Foundry Agents

2026-04-06 · Source: Microsoft Foundry Blog articles · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cloud Computing & IT Infrastructure, Data Science & Analytics · Depth: Intermediate, medium

Summary

Microsoft has released a solution for granular cost and usage tracking of AI agents deployed within Microsoft AI Foundry, addressing the challenge of understanding per-agent, per-model, and per-request expenses. This solution integrates Azure API Management (APIM) as an AI Gateway and Application Insights for telemetry storage and querying. APIM handles routing, rate limiting, authentication, and adds trace headers, while Application Insights receives token-level data via OpenTelemetry, populating `customMetrics` for cumulative counters and `traces` for detailed log entries. This architecture enables real-time cost attribution, allowing users to answer specific questions like an agent's average cost per request or prompt-to-completion token breakdown per model, without modifying the agents themselves. The solution is extensible, supporting any Foundry-hosted agent exposed through APIM with minimal configuration.

Key takeaway

For AI Architects and MLOps Engineers managing AI agents in Microsoft Foundry, implementing this APIM and Application Insights solution is crucial for gaining granular cost visibility. You can accurately attribute costs per agent and model, optimize prompt design, and make informed decisions on model selection. This approach provides real-time telemetry and KQL-driven insights without requiring modifications to your existing agent code, streamlining cost management and operational efficiency.

Key insights

Granular AI agent cost and usage tracking is achievable by integrating Azure API Management and Application Insights.

Principles

Centralize AI gateway functions via APIM.
Capture token-level telemetry via OpenTelemetry.
Enable custom cost analysis with KQL queries.

Method

Route AI agent requests through Azure API Management to stamp metadata. Capture token usage from responses and send to Application Insights via OpenTelemetry. Query `traces` and `customMetrics` tables using KQL for detailed cost analysis.

In practice

Use APIM for AI agent routing and rate limiting.
Implement OpenTelemetry for real-time cost telemetry.
Build custom KQL dashboards for agent cost visibility.

Topics

Microsoft AI Foundry
Azure API Management
Application Insights
AI Agent Cost Attribution
Token-level Telemetry

Code references

ccoellomsft/foundry-agents-apim-appinsights

Best for: AI Engineer, MLOps Engineer, AI Architect

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Microsoft Foundry Blog articles.