Introducing Phi-4-Reasoning-Vision to Microsoft Foundry

· Source: Microsoft Foundry Blog articles · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Advanced, medium

Summary

Microsoft has introduced Phi-4-Reasoning-Vision-15B, a 15-billion-parameter small language model (SLM) now available in Microsoft Foundry and Hugging Face, which combines high-resolution visual perception with selective, task-aware reasoning. This model extends SLMs beyond passive perception, enabling them to interpret visual structure, connect it with textual context, and perform multi-step reasoning for applications like agents, analytical tools, and scientific workflows. Developers can explicitly enable or disable reasoning via prompting to balance latency and accuracy, making it suitable for interactive, real-world applications. Benchmarks show competitive performance across multimodal reasoning, mathematics, and computer use tasks, including diagram-based math and GUI interpretations. The model supports use cases such as computer-use agents in retail scenarios and visual reasoning for education, and was developed with Microsoft's Responsible AI Principles, incorporating safety considerations throughout its training and evaluation.

Key takeaway

Phi-4-Reasoning-Vision-15B is a 15B parameter SLM now available in Microsoft Foundry and Hugging Face, integrating high-resolution vision with selective, task-aware reasoning. It allows explicit control over reasoning to balance latency and accuracy, demonstrating strong performance on benchmarks like 84.8% on AI2D_TEST and 83.3% on ChartQA_TEST. This enables efficient, grounded visual understanding for applications such as computer-use agents, scientific analysis, and interactive educational tools.

Topics

Best for: Computer Vision Engineer, AI Architect, AI Scientist, AI Engineer, Machine Learning Engineer, AI Researcher

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Microsoft Foundry Blog articles.