What's new in Foundry Labs - April 2026
Summary
Microsoft Foundry Labs has released several new AI models in April 2026, making advanced research prototypes accessible to developers. These include MAI-Transcribe-1, a speech recognition model offering 3.9% Word Error Rate on FLEURS and 50% lower GPU cost; MAI-Voice-1, a high-fidelity speech generation model; and MAI-Image-2, a text-to-image model ranking #3 on Arena.ai with 2x faster generation. Additionally, Foundry Labs introduced Harrier-oss-v1, an open-source multilingual text embedding model family supporting 94 languages, and Phi-4-Reasoning-Vision-15B, a 15B-parameter vision model for task-aware reasoning. VibeVoice ASR, a Microsoft Research model, provides long-form, structured speech recognition with speaker diarization and timestamping. Finally, GigaTIME, a multimodal AI model, translates H&E pathology slides into virtual mIF images across 21 protein channels, enabling population-scale tumor immune microenvironment modeling.
Key takeaway
For MLOps Engineers and AI developers building next-generation applications, exploring Microsoft Foundry Labs offers early access to high-performance, specialized AI models. You should evaluate MAI models for core speech and vision tasks, Harrier-oss-v1 for multilingual embeddings, and Phi-4-Reasoning-Vision-15B for vision reasoning to gain a competitive edge in your deployments.
Key insights
Microsoft Foundry Labs provides early access to advanced AI models for developers to explore and integrate.
Principles
- Cost-efficiency in AI inference is a key design goal.
- Multimodal and multilingual capabilities enhance AI utility.
- Smaller models can achieve competitive performance through distillation.
Method
GigaTIME uses a multimodal AI model trained on paired H&E and mIF data to generate virtual mIF images from low-cost H&E slides, enabling population-scale spatial proteomics studies.
In practice
- Use MAI-Transcribe-1 for cost-effective, accurate speech recognition.
- Deploy Harrier-oss-v1 for multilingual RAG pipelines.
- Integrate Phi-4-Reasoning-Vision-15B for visual reasoning in agentic applications.
Topics
- Microsoft Foundry Labs
- Speech AI
- Text-to-Image Generation
- Multilingual Text Embeddings
- Vision Reasoning
Best for: MLOps Engineer, CTO, VP of Engineering/Data, AI Engineer, Machine Learning Engineer, AI Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Microsoft Foundry Blog articles.