Introducing Fireworks AI on Microsoft Foundry: Bringing high performance, low latency open model inference to Azure

· Source: The Microsoft Cloud Blog · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cloud Computing & IT Infrastructure · Depth: Intermediate, short

Summary

Microsoft has announced the public preview of Fireworks AI on Microsoft Foundry, integrating high-performance open model inference into Azure. This initiative aims to provide developers with a unified platform to efficiently run, customize, and operationalize open models within an enterprise-ready AI lifecycle. Fireworks AI offers industry-leading inference, processing over 13 trillion tokens daily and sustaining approximately 180,000 requests per second, with benchmark performance validated by Artificial Analysis. The integration allows access to models like DeepSeek V3.2, OpenAI gpt-oss-120b, Kimi K2.5, and the new MiniMax M2.5, supporting serverless and provisioned throughput unit (PTU) pricing. Developers can also bring their own weights (BYOW) for custom models, ensuring optimized inference with Azure-grade governance and a consistent operational foundation for AI applications.

Key takeaway

For CTOs and VP of Engineering evaluating open model strategies, the integration of Fireworks AI into Microsoft Foundry offers a compelling solution. Your teams can now achieve high-performance, low-latency inference for open models within a governed Azure environment, reducing the need for bespoke serving stacks. This enables faster experimentation, streamlined operationalization, and greater control over costs and customization, accelerating your path from AI concept to secure, scalable production deployment.

Key insights

Fireworks AI on Microsoft Foundry provides high-performance, enterprise-grade open model inference within Azure's unified AI platform.

Principles

Method

Developers can access Fireworks AI models via the Microsoft Foundry catalog, select a model, view its card, choose a deployment option (serverless or PTU), and then deploy it for inference.

In practice

Topics

Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Engineer, Machine Learning Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by The Microsoft Cloud Blog.