Last Week in AI #334 - Kimi K2.5 & Code, Genie 3, OpenClaw & Moltbook
Summary
Moonshot AI has launched Kimi K2.5, an open-source, natively multimodal model trained on 15 trillion mixed visual and text tokens, capable of understanding text, images, and video. K2.5 demonstrates strong agentic capabilities, outperforming Gemini 3 Pro on SWE-Bench Verified, and both GPT 5.2 and Gemini 3 Pro on SWE-Bench Multilingual. For video understanding, it surpasses GPT 5.2 and Claude Opus 4.5 on VideoMMMU. Additionally, Moonshot introduced Kimi Code, an open-source coding agent that translates UI designs from images or videos into code, supporting integration with editors like VSCode. Google is expanding access to Genie 3, an experimental "general-purpose world model," to AI Ultra subscribers, enabling generation of dynamic, navigable 3D worlds from text and images. Meanwhile, OpenClaw (formerly Moltbot), an open-source, always-on AI assistant, has gained significant traction for its multi-platform messaging integration, despite security concerns regarding its access to real-world applications.
Key takeaway
For Machine Learning Engineers evaluating new model architectures, consider Moonshot AI's Kimi K2.5 for its multimodal capabilities and strong benchmark performance in coding and video understanding. Your teams should investigate its potential for agentic applications and UI-to-code generation, especially if seeking open-source alternatives to established models. Be mindful of the security implications when deploying always-on AI assistants like OpenClaw that access real-world applications.
Key insights
Multimodal AI models and coding agents are rapidly advancing, alongside new interactive world-building and always-on AI assistants.
Principles
- Multimodal training improves agentic capabilities.
- Self-distillation enhances RL learning efficiency and stability.
Method
Reinforcement Learning via Self-Distillation uses the model as an on-policy "self-teacher" by conditioning on tokenized feedback to produce dense, logit-level supervision for policy updates.
In practice
- Integrate Kimi Code into VSCode for UI-to-code translation.
- Use Google Genie 3 to generate navigable 3D worlds from prompts.
- Explore OpenClaw for proactive, multi-platform AI assistance.
Topics
- Multimodal AI
- AI Agents
- Generative World Models
- AI Safety & Ethics
- AI Business & Funding
Best for: Machine Learning Engineer, Computer Vision Engineer, CTO, AI Engineer, AI Product Manager, AI Researcher
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Last Week in AI.