Surflo: Consistent 3D Surface Flow Model with Global State

· Source: Takara TLDR - Daily AI Papers · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision · Depth: Expert, medium

Summary

Surflo, a novel 3D surface flow model, addresses limitations in existing feed-forward reconstruction methods by introducing a global state for consistent, high-resolution output from variable unposed RGB views. Unlike per-view models that generate overlapping pointmaps or global-latent methods with fixed, low-resolution outputs, Surflo compresses input images into K latent tokens. It then decodes oriented 3D surface points by transporting them from noise onto the surface using flow matching, allowing for arbitrary resolution output, from a few thousand to a million points in a single forward pass. To ensure consistency, an inference-time guidance term injects a photometric gradient during ODE integration, correlating nearby points. Surflo achieves performance comparable to or better than feed-forward baselines on surface metrics and is significantly faster than optimization-based techniques requiring hundreds of views. It stands out as the only feed-forward approach combining a global latent with arbitrary-resolution decoding.

Key takeaway

For Computer Vision Engineers needing high-fidelity 3D surface reconstruction from sparse, unposed RGB images, Surflo offers a compelling alternative. You can achieve arbitrary-resolution point clouds, from thousands to millions of points, significantly faster than traditional optimization-based methods. This approach combines a global latent state with flow matching, providing consistent geometry without fixed grid limitations. Consider integrating Surflo to accelerate your workflows and improve output quality for complex scene reconstruction tasks.

Key insights

Surflo uses a global latent state and flow matching to reconstruct arbitrary-resolution 3D surfaces from variable image inputs.

Principles

Method

Surflo compresses variable RGB views into K latent tokens, then decodes 3D surface points via flow matching from noise. An inference-time photometric gradient guides ODE integration for consistency.

In practice

Topics

Code references

Best for: Research Scientist, AI Scientist, Computer Vision Engineer, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.