WOMBET: World Model-based Experience Transfer for Robust and Sample-efficient Reinforcement Learning

2026-04-16 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Mathematics & Computational Sciences · Depth: Expert, extended

Summary

WOMBET (World Model-based Experience Transfer) is a novel framework for robust and sample-efficient reinforcement learning (RL) that addresses the high cost and risk of data collection in robotics. Unlike traditional offline-to-online RL methods that assume a fixed dataset, WOMBET jointly generates and utilizes prior data. It learns a world model in a source task, then generates offline data through uncertainty-penalized planning, filtering trajectories for high return and low epistemic uncertainty. This curated data is then used for online fine-tuning in a target task, employing adaptive sampling to balance offline and online data. WOMBET demonstrates improved sample efficiency and final performance over strong baselines on continuous control benchmarks, effectively unifying model-based offline data generation with model-free online adaptation.

Key takeaway

For research scientists developing RL systems for robotics or other data-scarce domains, WOMBET offers a robust approach to experience transfer. You should consider implementing its uncertainty-aware data generation and adaptive sampling mechanisms to create high-quality prior datasets and ensure stable, efficient online fine-tuning, significantly reducing the need for extensive real-world interaction and improving overall sample efficiency.

Key insights

WOMBET unifies uncertainty-aware model-based data generation with adaptive online fine-tuning for efficient RL experience transfer.

Principles

Uncertainty-penalized planning yields a provable lower bound on true return.
Dual-criterion filtering curates reliable, high-value offline data.
Adaptive sampling balances offline data stability with online adaptation.

Method

WOMBET iteratively refines a world model, generates offline data via uncertainty-penalized MPC and dual-criterion filtering, then fine-tunes online using adaptive data mixing and implicit regularization.

In practice

Use ensemble models to estimate epistemic uncertainty.
Filter generated trajectories by both return and uncertainty.
Dynamically adjust offline/online data mix based on TD error.

Topics

Reinforcement Learning
Experience Transfer
World Models
Offline-to-Online RL
Uncertainty-Aware Planning

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Robotics Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.