Connecting Claude Clients with Azure API Management and Claude Models in Microsoft Foundry
Summary
A production-ready pattern for connecting Claude clients (Claude Code, Desktop, Cowork) with Anthropic's Claude models in Microsoft Foundry is detailed, utilizing Azure API Management (APIM) as an LLM gateway. This setup, published on June 04, 2026, addresses key organizational challenges like API key sprawl, uncontrolled token usage, and lack of visibility into costs and usage. The architecture involves Claude models in Microsoft Foundry (billed via Azure), APIM authenticating developers with Entra ID, enforcing per-user rate limits and token quotas via GenAI gateway policies, and emitting per-user usage metrics to Azure Monitor. Foundry can reside in a separate Azure subscription (Subscription B) from APIM (Subscription A), with APIM authenticating to Foundry using either a Foundry API key or a managed identity. Setup time is approximately 2 hours for full implementation or 30 minutes for a basic pilot.
Key takeaway
For AI Engineers or MLOps teams deploying Claude Code, this pattern offers a robust solution to manage access and costs. You can centralize billing, enforce per-developer rate limits and token quotas, and gain granular usage visibility without exposing Anthropic API keys. Consider implementing the managed identity option for enhanced security and simplified key rotation in production environments. Start with a Developer SKU APIM instance for a quick pilot.
Key insights
An Azure API Management gateway provides secure, metered access to Anthropic Claude models in Microsoft Foundry.
Principles
- Decouple developer and backend authentication.
- Enforce token and request limits per user.
- Track usage with detailed metrics.
Method
Configure Azure API Management with Entra ID for developer authentication, GenAI policies for rate limiting and token quotas, and either a Foundry API key or managed identity for backend authentication to Claude models in Microsoft Foundry.
In practice
- Deploy Claude Sonnet 4.6, Haiku 4.5, Opus 4.6 in Foundry.
- Use "llm-token-limit" and "rate-limit-by-key" policies.
- Set "ANTHROPIC_BASE_URL" to APIM endpoint.
Topics
- Azure API Management
- Microsoft Foundry
- Anthropic Claude
- LLM Gateway
- Entra ID
- Token Quotas
- Cost Management
Best for: AI Engineer, MLOps Engineer, AI Architect
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Microsoft Foundry Blog articles.