Micro-World: First AMD Open-Source World Models for Interactive Video Generation

· Source: AMD ROCm Blogs · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Intermediate, long

Summary

AMD has introduced Micro-World, an open-source, action-controlled interactive world model designed for generating high-quality, open-domain video scenes. Built on the Wan2.1 model family, Micro-World includes both image-to-world (I2W) and text-to-world (T2W) variants. The model was trained using over 6,000 Minecraft gameplay clips, each 81 frames long, annotated with keyboard and mouse actions and text captions. Micro-World employs a two-stage training paradigm to transfer action knowledge from game environments to open-domain control, utilizing LoRA weights and a dedicated action processing module that injects features via ControlNet or Adaptive Layer Normalization (adaLN). The project fully open-sources its model weights, complete training and inference code, and a curated dataset, demonstrating superior performance against models like Oasis in image quality, action controllability, and temporal quality on AMD Instinct GPUs within the ROCm ecosystem.

Key takeaway

For AI Scientists and Computer Vision Engineers developing interactive video generation or world models, Micro-World offers a fully open-sourced, performant baseline. You should explore its two-stage training paradigm and action processing module for improved action control and open-domain transferability, especially when leveraging AMD Instinct GPUs and the ROCm ecosystem. Consider adapting its dataset curation strategy for balanced action distributions in your own projects.

Key insights

Micro-World is an open-source, action-controlled world model for interactive video generation, trained on Minecraft data.

Principles

Method

Micro-World uses a two-stage training: first, LoRA adapts the base model to game visuals; second, an action module learns generalized control, allowing LoRA removal for open-domain inference.

In practice

Topics

Code references

Best for: Computer Vision Engineer, AI Scientist, Research Scientist, AI Engineer, Machine Learning Engineer, AI Researcher

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by AMD ROCm Blogs.