PARSE: Part-Aware Relational Spatial Modeling

2025-11-12 · Source: cs.CV updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Data Science & Analytics · Depth: Expert, extended

Summary

The PARSE framework introduces a part-centric approach to 3D spatial modeling, addressing ambiguities in inter-object relations caused by coarse, object-level representations. It centers on the Part-centric Assembly Graph (PAG), which encodes geometric relations between specific object parts, and a Part-Aware Spatial Configuration Solver that converts these relations into geometric constraints to generate collision-free, physically valid 3D scenes. To support this, PARSE-10K was developed, a dataset of 10,000 3D indoor scenes with dense part-level contact annotations, built from 17372 part-segmented assets across 132 object categories. Experiments show fine-tuning Qwen3-VL on PARSE-10K significantly improves object-level layout reasoning (97.4% accuracy on Visual Relation MCQ) and part-level relation understanding (86.2% on Part-level Contact MCQ). Furthermore, using PAGs as structural priors in 3D generation models substantially enhances physical realism and structural complexity, as confirmed by user studies.

Key takeaway

For AI Scientists and 3D Scene Designers aiming to enhance spatial intelligence or generate physically consistent environments, you should prioritize part-level relational modeling. Utilizing frameworks like PARSE, which uses Part-centric Assembly Graphs and a Part-Aware Spatial Configuration Solver, can significantly improve the realism and structural complexity of your 3D scenes. Consider fine-tuning your Vision-Language Models on datasets like PARSE-10K to achieve superior object-level layout reasoning and accurate part-level relation understanding. This approach moves beyond coarse object-level interactions, enabling more precise and physically plausible outcomes.

Key insights

Part-level interactions are crucial for resolving spatial ambiguities and generating physically consistent 3D scenes.

Principles

Inter-object relations are best modeled at the part level.
Hierarchical DAGs simplify complex scene assembly.
Coarse-to-fine constraint application refines object pose space.

Method

The Part-Aware Spatial Configuration Solver instantiates PAGs by topologically sorting objects, applying coarse object-level constraints, then fine-grained part-level alignments, and sampling collision-free poses, refined by physics simulation.

In practice

Fine-tune VLMs with part-level contact data for improved spatial reasoning.
Integrate Part-centric Assembly Graphs as priors for realistic 3D scene generation.

Topics

3D Scene Generation
Spatial Reasoning
Part-centric Assembly Graph
Vision-Language Models
Geometric Constraints
PARSE-10K Dataset

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CV updates on arXiv.org.