OrbitForge: Text-to-3D Scene Generation via Reconstruction-Anchored Video Synthesis

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Emerging Technologies & Innovation · Depth: Expert, quick

Summary

OrbitForge is an adapter for text-to-3D scene generation, converting text-generated videos into canonical closed-orbit 3D Gaussian Splatting scenes. It addresses issues like uncontrolled camera motion, partial view coverage, and temporal inconsistencies in generic text-to-video models. OrbitForge achieves this by using 3D reconstruction as an anchor, first generating a preliminary 3D reconstruction via Deformable Gaussian Splatting with a MedianGS proxy. It then renders views from a prescribed orbit to detect missing viewpoints. The text-to-video model completes only these missing views before reconstructing the full orbit into a final scene. This design avoids task-specific fine-tuning or per-prompt score-distillation. On a 300-prompt T3Bench-derived audit, OrbitForge attained a 359.0-degree median span and raised Q10 ImageReward from 8.07 to 16.36, while remaining competitive with VideoMV.

Key takeaway

For Computer Vision Engineers developing 3D content from text, OrbitForge offers a robust method to overcome the limitations of direct text-to-video models. Its reconstruction-anchored video synthesis approach ensures greater 3D consistency and view coverage without extensive fine-tuning. You should consider this technique to generate more reliable and complete 3D Gaussian Splatting scenes, improving asset quality and reducing manual correction efforts.

Key insights

OrbitForge converts text-generated videos into consistent 3D Gaussian Splatting scenes by anchoring reconstruction.

Principles

Method

Obtain preliminary 3D reconstruction via Deformable Gaussian Splatting. Render orbit views to find missing perspectives. Complete only missing views using text-to-video. Reconstruct completed orbit into final Gaussian Splatting scene.

In practice

Topics

Best for: Research Scientist, AI Scientist, Computer Vision Engineer, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.