Déjà View: Looping Transformers for Multi-View 3D Reconstruction
Summary
DéjàView, a novel 3D reconstruction model, applies a single looped transformer block recurrently for K refinement steps, challenging the trend of increasing model capacity in computer vision. It posits that traditional feed-forward transformer depth often inefficiently simulates iteration, which DéjàView makes explicit in its architecture. Trained once, DéjàView exposes K as an inference-time compute knob, matching or outperforming substantially larger feed-forward baselines across five reconstruction benchmarks, including indoor, outdoor, object-centric, and driving scenes. It achieves this while using a fraction of their parameters and comparable or lower compute, suggesting explicit iteration provides a stronger inductive bias for multi-view 3D reconstruction.
Key takeaway
For Machine Learning Engineers optimizing multi-view 3D reconstruction models, you should consider recurrent architectures like DéjàView over deep feed-forward designs. This approach allows you to achieve comparable or superior performance with significantly fewer parameters and adjustable inference compute. Explore explicit iteration in your model designs to potentially gain a stronger inductive bias and improve efficiency for complex 3D tasks.
Key insights
DéjàView uses a single looped transformer block recurrently for efficient multi-view 3D reconstruction, outperforming larger feed-forward models.
Principles
- Model depth can inefficiently simulate iteration in feed-forward transformers.
- Explicit iteration provides a stronger inductive bias for multi-view 3D reconstruction.
Method
DéjàView applies a single transformer block recurrently to per-view features for K refinement steps, exposing K as an inference-time compute knob.
In practice
- Achieve competitive 3D reconstruction with fewer model parameters.
- Dynamically adjust inference compute via K refinement steps at runtime.
Topics
- Multi-View 3D Reconstruction
- Looping Transformers
- Model Efficiency
- Recurrent Neural Networks
- Computer Vision
- Inductive Bias
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.