Kimi-K2.5 Now in Microsoft Foundry
Summary
Moonshot AI's Kimi K2.5, a next-generation multimodal and agentic model, is now accessible through Microsoft Foundry. This release introduces significant enhancements, including native multimodality achieved by pre-training with 15 trillion additional vision-text tokens, which boosts image/video understanding, OCR, and multimodal QA. The model also features Agent Swarm execution, capable of orchestrating up to 100 parallel agents and 1,500 tool calls, leading to a 4.5x reduction in execution time compared to sequential K2 workflows. Furthermore, Kimi K2.5 offers stronger image/video to code capabilities, encompassing visual debugging and UI reconstruction from visual inputs. Moonshot AI reports state-of-the-art benchmark results, including 96.1% on AIME 2025, 87.1% on MMLUPro, and 78.5% on MMMUPro (Vision). Input tokens are priced at $0.60 per 1M, and output tokens at $3 per 1M.
Key takeaway
For CTOs evaluating advanced AI models for integration, Kimi K2.5's availability in Microsoft Foundry offers a compelling option due to its multimodal capabilities and Agent Swarm execution. Your teams can leverage its enhanced vision-language understanding and accelerated task completion for complex coding and QA projects, potentially reducing development cycles and improving output quality. Consider piloting Kimi K2.5 for applications requiring robust visual debugging or UI reconstruction from visual inputs.
Key insights
Kimi K2.5 integrates advanced multimodality and agentic execution for enhanced AI performance and efficiency.
Principles
- Multimodal pre-training improves vision-language integration.
- Parallel agent orchestration significantly reduces execution time.
Method
Kimi K2.5 utilizes pre-training with 15T vision-text tokens for native multimodality and employs Agent Swarm for parallel execution of up to 100 agents and 1,500 tool calls.
In practice
- Use Kimi K2.5 for multimodal coding workflows.
- Apply Agent Swarm for faster complex task execution.
Topics
- Kimi K-2.5
- Multimodal AI
- Agentic Models
- Microsoft Foundry
- Vision-Language Models
Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Engineer, Machine Learning Engineer, AI Product Manager
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Microsoft Foundry Blog articles.