ReCoSplat: Autoregressive Feed-Forward Gaussian Splatting Using Render-and-Compare
Summary
ReCoSplat is an autoregressive feed-forward Gaussian Splatting model designed for online novel view synthesis, capable of reconstructing scenes from sequential, potentially unposed observations, with or without camera intrinsics. It addresses the training dilemma of using ground-truth versus predicted poses by introducing a Render-and-Compare (ReCo) module. This module renders the current scene reconstruction from the predicted viewpoint and compares it against the incoming observation, generating a stable conditioning signal that mitigates pose errors. For processing long sequences, ReCoSplat incorporates a hybrid KV cache compression strategy, which combines early-layer truncation with chunk-level selective retention, effectively reducing the KV cache size by over 90% for sequences exceeding 100 frames. The model achieves state-of-the-art performance across various input settings on both in-distribution and out-of-distribution benchmarks.
Key takeaway
For research scientists developing real-time 3D reconstruction or novel view synthesis systems, ReCoSplat's Render-and-Compare (ReCo) module offers a robust approach to handle pose uncertainties, improving stability when ground-truth poses are unavailable. You should consider integrating similar render-and-compare mechanisms to enhance model resilience to noisy or predicted camera poses in your own projects.
Key insights
ReCoSplat uses a Render-and-Compare module and KV cache compression for robust, online novel view synthesis.
Principles
- Stable conditioning compensates for pose errors.
- Hybrid KV cache compression reduces memory.
- Local Gaussian assembly scales effectively.
Method
ReCoSplat employs a Render-and-Compare (ReCo) module to stabilize training with predicted poses by comparing rendered reconstructions with incoming observations. It also uses hybrid KV cache compression for long sequences.
In practice
- Apply ReCo for pose-error robust training.
- Implement hybrid KV cache for long sequences.
- Use Gaussian Splatting for novel view synthesis.
Topics
- Gaussian Splatting
- Novel View Synthesis
- Autoregressive Models
- Camera Pose Estimation
- KV Cache Compression
Best for: Research Scientist, AI Researcher, AI Scientist, Computer Vision Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.