Announcing Fireworks AI on Microsoft Foundry

2026-03-11 · Source: Microsoft Foundry Blog articles · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cloud Computing & IT Infrastructure · Depth: Intermediate, short

Summary

Microsoft Foundry now offers high-performance, low-latency inference for popular open models hosted on the Fireworks AI cloud, accessible to Foundry customers. This public preview launch includes serverless pay-per-token and global provisioned throughput deployments for models like Minimax M2.5, OpenAI's gpt-oss-120b, MoonshotAI's Kimi-K2.5, and DeepSeek-v3.2. Customers can also import and deploy their own fine-tuned versions of these models, including Qwen3-14B, within their Foundry projects, with support for high-speed file transfer via Azure Developer CLI (`azd`). Serverless pricing in the US Data Zone provides 25K or 250K tokens per minute (TPM) quotas, while provisioned throughput offers consistent performance with specific PTU requirements and latency targets for each model. Customers must opt-in via the Azure portal's Preview features panel to enable this integration.

Key takeaway

For Machine Learning Engineers seeking to deploy cutting-edge open-source or custom fine-tuned models, Microsoft Foundry's integration with Fireworks AI provides critical flexibility. You can now choose between pay-per-token serverless inference for cost efficiency or provisioned throughput for consistent, low-latency performance, directly within your existing Foundry projects. Evaluate the new model catalog and consider migrating your post-trained models to leverage these enhanced deployment options.

Key insights

Microsoft Foundry integrates Fireworks AI for enhanced open model inference and custom model deployment.

Principles

Offer diverse deployment options for AI models.
Support custom fine-tuned models for specific use cases.

Method

Customers opt-in via Azure portal, then deploy models using serverless pay-per-token or provisioned throughput. Custom models are imported via the Foundry UI or `azd` CLI.

In practice

Deploy gpt-oss-120b with $0.17/1M input tokens.
Use `azd ai models create` for fast model weight transfer.

Topics

Microsoft Foundry
Fireworks AI
Large Language Model Inference
Custom Model Deployment
Serverless & Provisioned Throughput

Best for: Machine Learning Engineer, NLP Engineer, CTO, MLOps Engineer, AI Engineer, Data Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Microsoft Foundry Blog articles.