OrbitForge: Text-to-3D Scene Generation via Reconstruction-Anchored Video Synthesis
Summary
OrbitForge is an adapter for text-to-3D scene generation, converting text-generated videos into canonical closed-orbit 3D Gaussian Splatting scenes. It addresses issues like uncontrolled camera motion, partial view coverage, and temporal inconsistencies in generic text-to-video models. OrbitForge achieves this by using 3D reconstruction as an anchor, first generating a preliminary 3D reconstruction via Deformable Gaussian Splatting with a MedianGS proxy. It then renders views from a prescribed orbit to detect missing viewpoints. The text-to-video model completes only these missing views before reconstructing the full orbit into a final scene. This design avoids task-specific fine-tuning or per-prompt score-distillation. On a 300-prompt T3Bench-derived audit, OrbitForge attained a 359.0-degree median span and raised Q10 ImageReward from 8.07 to 16.36, while remaining competitive with VideoMV.
Key takeaway
For Computer Vision Engineers developing 3D content from text, OrbitForge offers a robust method to overcome the limitations of direct text-to-video models. Its reconstruction-anchored video synthesis approach ensures greater 3D consistency and view coverage without extensive fine-tuning. You should consider this technique to generate more reliable and complete 3D Gaussian Splatting scenes, improving asset quality and reducing manual correction efforts.
Key insights
OrbitForge converts text-generated videos into consistent 3D Gaussian Splatting scenes by anchoring reconstruction.
Principles
- 3D reconstruction can anchor video generation for consistency.
- Coverage-aware evaluation is crucial for 3D scene generation.
Method
Obtain preliminary 3D reconstruction via Deformable Gaussian Splatting. Render orbit views to find missing perspectives. Complete only missing views using text-to-video. Reconstruct completed orbit into final Gaussian Splatting scene.
In practice
- Utilize frozen video priors for 3D asset creation.
- Optimize Gaussian Splatting per-prompt for consistency.
Topics
- Text-to-3D Scene Generation
- Gaussian Splatting
- Video Synthesis
- 3D Reconstruction
- Computer Vision
- Deformable Gaussian Splatting
Best for: Research Scientist, AI Scientist, Computer Vision Engineer, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.