Reinforcement Learning from Cross-domain Videos with Video Prediction Model

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Computer Vision & Pattern Recognition · Depth: Expert, quick

Summary

XIPER (Cross-domain Video Prediction Reward) is a novel reward model designed to facilitate reinforcement learning from expert videos collected in visually distinct domains. It tackles the challenges posed by absent reward signals and significant domain gaps, such as variations in agent color, morphology, or the sim-to-real discrepancy. XIPER operates by training a cross-domain video prediction model that maps agent observations into the expert domain, subsequently utilizing the prediction likelihood as a reward signal. Experimental evaluations demonstrate XIPER's consistent superior performance over baselines across 8 tasks in the DMC Color Suite and 3 tasks in the DMC Body Suite, even with substantial domain differences. Furthermore, analysis on a sim-to-real transfer dataset confirms XIPER's ability to generate meaningful reward signals for real-robot observations, using only simulated expert videos.

Key takeaway

For Machine Learning Engineers developing reinforcement learning agents with cross-domain expert videos, XIPER offers a robust solution. You should consider integrating its video prediction reward model to effectively bridge visual domain gaps, such as those arising from agent color, morphology, or sim-to-real discrepancies. This approach enables you to utilize simulated expert data for real-robot observations, significantly simplifying the reward engineering process in complex environments.

Key insights

XIPER enables reinforcement learning from cross-domain expert videos by using video prediction likelihood as a reward signal.

Principles

Method

XIPER trains a cross-domain video prediction model to map agent observations to the expert domain, then uses the prediction likelihood as the reward signal.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Robotics Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.