Micro-World: First AMD Open-Source World Models for Interactive Video Generation
Summary
AMD has introduced Micro-World, an open-source, action-controlled interactive world model designed for generating high-quality, open-domain video scenes. Built on the Wan2.1 model family, Micro-World includes both image-to-world (I2W) and text-to-world (T2W) variants. The model was trained using over 6,000 Minecraft gameplay clips, each 81 frames long, annotated with keyboard and mouse actions and text captions. Micro-World employs a two-stage training paradigm to transfer action knowledge from game environments to open-domain control, utilizing LoRA weights and a dedicated action processing module that injects features via ControlNet or Adaptive Layer Normalization (adaLN). The project fully open-sources its model weights, complete training and inference code, and a curated dataset, demonstrating superior performance against models like Oasis in image quality, action controllability, and temporal quality on AMD Instinct GPUs within the ROCm ecosystem.
Key takeaway
For AI Scientists and Computer Vision Engineers developing interactive video generation or world models, Micro-World offers a fully open-sourced, performant baseline. You should explore its two-stage training paradigm and action processing module for improved action control and open-domain transferability, especially when leveraging AMD Instinct GPUs and the ROCm ecosystem. Consider adapting its dataset curation strategy for balanced action distributions in your own projects.
Key insights
Micro-World is an open-source, action-controlled world model for interactive video generation, trained on Minecraft data.
Principles
- Open-sourcing fosters reproducibility and community progress.
- Game data can proxy for learning transferable world dynamics.
- Decoupling action learning from visual style improves transferability.
Method
Micro-World uses a two-stage training: first, LoRA adapts the base model to game visuals; second, an action module learns generalized control, allowing LoRA removal for open-domain inference.
In practice
- Use Minecraft API for diverse biome data collection.
- Employ ControlNet or adaLN for action feature injection.
- Train on AMD Instinct MI325X GPUs with ROCm.
Topics
- World Models
- Interactive Video Generation
- Open-Source AI
- AMD ROCm
- Diffusion Models
Code references
Best for: Computer Vision Engineer, AI Scientist, Research Scientist, AI Engineer, Machine Learning Engineer, AI Researcher
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by AMD ROCm Blogs.