GASE: Gaussian Splatting-Based Automated System for Reconstructing Embodied-Simulation Environments

2026-06-16 · Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Robotics & Autonomous Systems, Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

GASE is a Gaussian Splatting-Based Automated System designed for reconstructing high-fidelity embodied-simulation environments, addressing challenges in robot learning. It tackles the inefficiencies of current reconstruction methods, specifically in data acquisition and foreground object extraction, which contribute to a significant sim-to-real gap. GASE utilizes multi-view video streams from panoramic camera arrays for rapid environment scanning. Its pipeline incorporates a camera-pose-based strategy for robust 2D object extraction across frames, followed by advanced scene inpainting. Foreground objects and the static background are reconstructed independently before seamless integration into physics simulators for policy training. Experiments show GASE improves segmentation accuracy by over 10% compared to existing 3D Gaussian-based methods and achieves state-of-the-art inpainting quality. Real-robot deployments for manipulation and navigation tasks demonstrate a performance gap of less than 10% against policies trained solely on real-world data, confirming its effectiveness in bridging the sim-to-real gap.

Key takeaway

For Robotics Engineers and AI Scientists focused on training embodied agents, GASE provides a compelling solution to the sim-to-real gap. Its automated system for reconstructing high-fidelity simulation environments, leveraging multi-view video and robust object extraction, significantly improves scene quality and segmentation accuracy. You should consider GASE for rapidly constructing large-scale, cost-effective simulation scenes, as it has demonstrated a performance gap of less than 10% compared to real-world trained policies in robot deployments.

Key insights

GASE automates high-fidelity simulation environment reconstruction using multi-view video and Gaussian splatting, reducing the sim-to-real gap.

Principles

Multi-view panoramic video enables rapid environment scanning.
Independent reconstruction of foreground and background improves quality.
Camera-pose-based 2D object extraction enhances robustness.

Method

GASE uses panoramic camera arrays for multi-view video, extracts 2D objects via camera-pose, inpaints scenes, then reconstructs foreground/background separately for simulator import.

In practice

Use GASE for cost-effective, large-scale data augmentation.
Apply GASE to create high-fidelity robot manipulation tasks.
Deploy GASE for navigation policy training in simulations.

Topics

Gaussian Splatting
Embodied AI
Robot Learning
Simulation Environments
Sim-to-Real Transfer
Multi-view Reconstruction
Scene Inpainting

Best for: Computer Vision Engineer, Research Scientist, Robotics Engineer, Machine Learning Engineer, AI Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.