Video Reconstruction using Diffusion-based Image-to-Video Generation with Trajectory Guidance
Summary
A new pipeline reconstructs missing frames in top-down drone video of autonomous surface vehicles (ASVs) performing maritime maneuvers by leveraging GPS telemetry. The method converts raw GPS coordinates and a single reference frame into a trajectory-guided video sequence using SG-I2V, a pre-trained image-to-video diffusion model, without requiring domain-specific fine-tuning. GPS coordinates are projected into image space via an equirectangular mapping, generating per-vessel motion cues that condition the diffusion model. Evaluated against ground-truth video, the SG-I2V pipeline produced the most naturally appearing frames (BRISQUE 25.52 vs. ground-truth 23.64), the most realistic motion magnitude (temporal smoothness 1.14 vs. ground-truth 1.42), and the strongest GPS trajectory adherence (9.31px vs. 28.70px for ground-truth), outperforming optical flow extrapolation and RIFE interpolation baselines in challenging low-texture, small-object conditions.
Key takeaway
For research scientists working on video reconstruction in challenging environments like maritime surveillance, this work demonstrates that integrating GPS telemetry with image-to-video diffusion models offers a robust solution for synthesizing missing frames. You should consider incorporating auxiliary sensor data to provide explicit motion cues, especially when visual signals alone are insufficient, as this significantly improves frame naturalness, motion realism, and trajectory adherence compared to traditional interpolation methods.
Key insights
Trajectory-guided diffusion models can reconstruct missing video frames by integrating external sensor data.
Principles
- Auxiliary sensor data enhances visual synthesis.
- Diffusion models generalize without fine-tuning.
- Spatial and temporal coherence are critical for video.
Method
The pipeline involves GPS-to-pixel mapping, bounding-box initialization, and trajectory-conditioned video generation using SG-I2V, followed by quantitative evaluation against ground truth.
In practice
- Use GPS telemetry to guide video reconstruction.
- Project real-world coordinates into image space.
- Employ pre-trained diffusion models for synthesis.
Topics
- Video Reconstruction
- Diffusion Models
- Image-to-Video Generation
- Trajectory Guidance
- Autonomous Surface Vehicles
Code references
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CV updates on arXiv.org.