Introducing Fireworks AI on Microsoft Foundry: Bringing high performance, low latency open model inference to Azure
Summary
Microsoft has announced the public preview of Fireworks AI on Microsoft Foundry, integrating high-performance open model inference into Azure. This initiative aims to provide developers with a unified platform to efficiently run, customize, and operationalize open models within an enterprise-ready AI lifecycle. Fireworks AI offers industry-leading inference, processing over 13 trillion tokens daily and sustaining approximately 180,000 requests per second, with benchmark performance validated by Artificial Analysis. The integration allows access to models like DeepSeek V3.2, OpenAI gpt-oss-120b, Kimi K2.5, and the new MiniMax M2.5, supporting serverless and provisioned throughput unit (PTU) pricing. Developers can also bring their own weights (BYOW) for custom models, ensuring optimized inference with Azure-grade governance and a consistent operational foundation for AI applications.
Key takeaway
For CTOs and VP of Engineering evaluating open model strategies, the integration of Fireworks AI into Microsoft Foundry offers a compelling solution. Your teams can now achieve high-performance, low-latency inference for open models within a governed Azure environment, reducing the need for bespoke serving stacks. This enables faster experimentation, streamlined operationalization, and greater control over costs and customization, accelerating your path from AI concept to secure, scalable production deployment.
Key insights
Fireworks AI on Microsoft Foundry provides high-performance, enterprise-grade open model inference within Azure's unified AI platform.
Principles
- Open models offer control over performance, cost, and security.
- Unified platforms streamline AI lifecycle management.
- High-throughput inference is crucial for enterprise-scale AI.
Method
Developers can access Fireworks AI models via the Microsoft Foundry catalog, select a model, view its card, choose a deployment option (serverless or PTU), and then deploy it for inference.
In practice
- Deploy DeepSeek V3.2 or MiniMax M2.5 on Azure.
- Upload custom model weights for optimized inference.
- Utilize serverless pricing for experimentation.
Topics
- Microsoft Foundry
- Fireworks AI
- Open Model Inference
- Enterprise AI Lifecycle
- Azure AI
Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Engineer, Machine Learning Engineer, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by The Microsoft Cloud Blog.