Micro-World: First AMD Open-Source World Models for Interactive Video Generation

2026-02-05 · Source: AMD ROCm Blogs · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Intermediate, long

Summary

AMD has introduced Micro-World, an open-source, action-controlled interactive world model designed for generating high-quality, open-domain video scenes. Built on the Wan2.1 model family, Micro-World includes both image-to-world (I2W) and text-to-world (T2W) variants. The model was trained using over 6,000 Minecraft gameplay clips, each 81 frames long, annotated with keyboard and mouse actions and text captions. Micro-World employs a two-stage training paradigm to transfer action knowledge from game environments to open-domain control, utilizing LoRA weights and a dedicated action processing module that injects features via ControlNet or Adaptive Layer Normalization (adaLN). The project fully open-sources its model weights, complete training and inference code, and a curated dataset, demonstrating superior performance against models like Oasis in image quality, action controllability, and temporal quality on AMD Instinct GPUs within the ROCm ecosystem.

Key takeaway

For AI Scientists and Computer Vision Engineers developing interactive video generation or world models, Micro-World offers a fully open-sourced, performant baseline. You should explore its two-stage training paradigm and action processing module for improved action control and open-domain transferability, especially when leveraging AMD Instinct GPUs and the ROCm ecosystem. Consider adapting its dataset curation strategy for balanced action distributions in your own projects.

Key insights

Micro-World is an open-source, action-controlled world model for interactive video generation, trained on Minecraft data.

Principles

Open-sourcing fosters reproducibility and community progress.
Game data can proxy for learning transferable world dynamics.
Decoupling action learning from visual style improves transferability.

Method

Micro-World uses a two-stage training: first, LoRA adapts the base model to game visuals; second, an action module learns generalized control, allowing LoRA removal for open-domain inference.

In practice

Use Minecraft API for diverse biome data collection.
Employ ControlNet or adaLN for action feature injection.
Train on AMD Instinct MI325X GPUs with ROCm.

Topics

World Models
Interactive Video Generation
Open-Source AI
AMD ROCm
Diffusion Models

Code references

Best for: Computer Vision Engineer, AI Scientist, Research Scientist, AI Engineer, Machine Learning Engineer, AI Researcher

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by AMD ROCm Blogs.