ReCoSplat: Autoregressive Feed-Forward Gaussian Splatting Using Render-and-Compare

· Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, quick

Summary

ReCoSplat is an autoregressive feed-forward Gaussian Splatting model designed for online novel view synthesis, capable of reconstructing scenes from sequential, potentially unposed observations, with or without camera intrinsics. It addresses the training dilemma of using ground-truth versus predicted poses by introducing a Render-and-Compare (ReCo) module. This module renders the current scene reconstruction from the predicted viewpoint and compares it against the incoming observation, generating a stable conditioning signal that mitigates pose errors. For processing long sequences, ReCoSplat incorporates a hybrid KV cache compression strategy, which combines early-layer truncation with chunk-level selective retention, effectively reducing the KV cache size by over 90% for sequences exceeding 100 frames. The model achieves state-of-the-art performance across various input settings on both in-distribution and out-of-distribution benchmarks.

Key takeaway

For research scientists developing real-time 3D reconstruction or novel view synthesis systems, ReCoSplat's Render-and-Compare (ReCo) module offers a robust approach to handle pose uncertainties, improving stability when ground-truth poses are unavailable. You should consider integrating similar render-and-compare mechanisms to enhance model resilience to noisy or predicted camera poses in your own projects.

Key insights

ReCoSplat uses a Render-and-Compare module and KV cache compression for robust, online novel view synthesis.

Principles

Method

ReCoSplat employs a Render-and-Compare (ReCo) module to stabilize training with predicted poses by comparing rendered reconstructions with incoming observations. It also uses hybrid KV cache compression for long sequences.

In practice

Topics

Best for: Research Scientist, AI Researcher, AI Scientist, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.