Fine-Tuning NVIDIA Cosmos Predict 2.5 with LoRA/DoRA for Robot Video Generation

· Source: Hugging Face - Blog · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Software Development & Engineering · Depth: Intermediate, long

Summary

This guide details the parameter-efficient fine-tuning of NVIDIA Cosmos Predict 2.5, a 2B-parameter world model for generating physically plausible videos, using LoRA and DoRA techniques. The fine-tuning adapts the base model for specific domains like robot manipulation, addressing challenges such as catastrophic forgetting and high computational costs associated with full fine-tuning. The process involves using `diffusers` and `accelerate` libraries for single- or multi-GPU training, with a focus on generating synthetic robot trajectories. The training utilizes a dataset of 92 robot manipulation videos and 50 (prompt, image) pairs, employing a rectified flow loss function. Evaluation metrics include Sampson Error for geometric consistency and LLM-as-a-Judge scores for physical plausibility and instruction following, demonstrating significant improvements in video quality and task completion after fine-tuning.

Key takeaway

For AI Engineers developing robot learning applications, fine-tuning large video world models like Cosmos Predict 2.5 with LoRA or DoRA is a practical approach to generate high-quality synthetic trajectories. This method significantly improves physical plausibility and instruction following compared to base models, enabling more efficient data generation for robot policy training. Consider starting with LoRA r=8 for memory efficiency, or DoRA r=32 if low-rank LoRA shows instability.

Key insights

LoRA/DoRA fine-tuning effectively adapts large world models like Cosmos Predict 2.5 for domain-specific robot video generation.

Principles

Method

Fine-tune Cosmos Predict 2.5 by injecting LoRA/DoRA adapters into the DiT's attention and feedforward layers, freezing base weights, and optimizing with AdamW and a linear learning rate scheduler.

In practice

Topics

Code references

Best for: Machine Learning Engineer, AI Engineer, Robotics Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Hugging Face - Blog.