Hallucination in World Models is Predictable and Preventable

· Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision & Pattern Recognition, Robotics & Autonomous Systems · Depth: Expert, quick

Summary

Hallucination in modern generative world models, where rollouts visually drift from ground-truth dynamics, is identified as a data coverage issue. Researchers introduced MMBench2, a 427-hour, 210-task dataset for visual world modeling, and trained a 350M-parameter world model. They found three distinct hallucination modes: perceptual, action-marginalized, and scene-diverging, and developed three signals that accurately predict model failures. To mitigate this, a coverage-aware sampling technique was developed for training, and hallucination predictors were used as curiosity rewards for targeted data collection. This finetuning recipe adapts pretrained models to unseen environments with as few as 50 real environment trajectories, demonstrating that detection signals can also guide mitigation.

Key takeaway

If you are developing or deploying generative world models, understanding that hallucination is primarily a data coverage issue is crucial for improving reliability. You should integrate coverage-aware sampling during training and leverage hallucination predictors as curiosity rewards for efficient online finetuning. This approach allows your models to adapt to novel environments robustly, potentially reducing the need for extensive new data collection.

Key insights

Hallucination in world models is a predictable and preventable data coverage problem.

Principles

Method

Develop coverage-aware sampling for training and use hallucination predictors as curiosity rewards for data-efficient online finetuning, adapting models with as few as 50 real trajectories.

In practice

Topics

Best for: Research Scientist, Computer Vision Engineer, AI Scientist, Machine Learning Engineer, Robotics Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.