HorizonForge: Driving Scene Editing with Any Trajectories and Any Vehicles

2026-02-26 · Source: cs.CV updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Software Development & Engineering · Depth: Advanced, extended

Summary

HorizonForge is a unified framework for generating photorealistic and controllable driving scenes, addressing the challenge of jointly achieving realism and precise control in autonomous driving simulation. It reconstructs scenes using editable Gaussian Splats and Meshes, enabling fine-grained 3D manipulation and language-driven vehicle insertion. The framework renders edits through a noise-aware video diffusion process, ensuring spatial and temporal consistency in a single feed-forward pass, thus avoiding per-trajectory optimization. The authors also introduce HorizonSuite, a comprehensive benchmark for evaluating ego- and agent-level editing tasks, including trajectory modifications and object manipulation. Extensive experiments demonstrate that HorizonForge achieves an 83.4% user-preference gain and a 25.19% FID improvement over the second-best method, validating the superior fidelity of its Gaussian-Mesh representation and the necessity of temporal priors from video diffusion for coherent synthesis.

Key takeaway

For AI Scientists and Research Scientists developing autonomous driving simulation, HorizonForge offers a robust framework for generating highly realistic and controllable driving scenarios. Its use of 3D Gaussian Splats and video diffusion models significantly enhances visual fidelity and temporal consistency, allowing for precise manipulation of ego and agent trajectories. You should consider integrating similar 3D representation and video diffusion techniques to improve the realism and controllability of your simulation environments, especially for evaluating long-tail and safety-critical events.

Key insights

HorizonForge enables photorealistic, controllable driving scene generation using 3D Gaussian Splats and meshes with video diffusion.

Principles

3D Gaussian Splats encode richer appearance cues for accurate edits.
Temporal priors from video diffusion are essential for coherent synthesis.

Method

HorizonForge reconstructs scenes into editable 3D Gaussian Splats and meshes, then renders edits via a noise-aware video diffusion model to ensure spatio-temporal consistency and support language-guided object insertion.

In practice

Use Gaussian Splats for high-fidelity 3D scene representation.
Employ video diffusion models for temporal consistency in video generation.

Topics

Driving Scene Generation
Gaussian Splatting
Video Diffusion Models
Autonomous Driving Simulation
3D Object Editing

Code references

Best for: AI Scientist, Research Scientist, AI Researcher, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CV updates on arXiv.org.