MatterDoor: Sampling Zero-shot Spatio-semantic Priors using Generative Models

2026-06-08 · Source: cs.CV updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, extended

Summary

The "MatterDoor" pipeline, developed by researchers at the Australian National University, introduces a sampling-based method using large-scale pretrained generative models to produce probabilistic priors for robotic planning under partial observability. This zero-shot approach, conditioned on partial observations, recovers complete RGB-D point cloud samples with occupancy and target semantics. The pipeline was evaluated on a Matterport3D benchmark of 10 indoor scenes, where a robot navigates partially visible rooms through doorways to an unobserved target object. Experiments, conducted on a system with a single NVIDIA RTX 4090 GPU (24 GB VRAM), 64 GB system memory, and an Intel i9-900K processor, demonstrated that the approach recovers commonsense spatial semantics consistent with ground truth, yielding diverse, clean 3D point clouds usable in motion planning. Each sample inference takes approximately 10.5 seconds.

Key takeaway

For Robotics Engineers developing autonomous navigation systems in uncertain indoor environments, this research suggests a viable method to overcome partial observability. You should consider integrating generative models to sample spatio-semantic priors, enabling robust motion planning by reasoning about unobserved occupancy and target locations. This approach, demonstrated with a Stretch robot, provides usable 3D environment samples, improving task success probability and path planning in complex, occluded scenes.

Key insights

Generative models can sample spatio-semantic priors for robotic planning in partially observed 3D environments.

Principles

Priors are vital for planning under partial observability.
Generative models can capture diverse environment uncertainty.
Spatio-semantic priors connect workspace to configuration space.

Method

The pipeline uses VLM-conditioned image outpainting, monocular depth estimation, and semantic segmentation to generate floor-aligned RGB-D point clouds for planning.

In practice

Use constrained VLM prompts for better semantic recovery.
Quantize generative models (e.g., int4) to fit VRAM constraints.
Align sampled point clouds with observed ground truth for planning.

Topics

Robotic Planning
Generative Models
Spatio-Semantic Priors
Partial Observability
Matterport3D
Motion Planning

Best for: Research Scientist, AI Scientist, Robotics Engineer, Machine Learning Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CV updates on arXiv.org.