Stream3D: Sequential Multi-View 3D Generation via Evidential Memory

2026-05-20 · Source: Takara TLDR - Daily AI Papers · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

Stream3D is a novel, training-free streaming mechanism designed to address temporal inconsistency in 3D object generation from long monocular video streams. Existing view-conditioned 3D generators, such as SAM 3D, TRELLIS, and Hunyuan3D, produce high-quality reconstructions from single views but suffer from severe temporal inconsistency when applied independently to sequential frames. Stream3D converts any frozen view-conditioned 3D generator into a streaming generator by employing a compact "evidential memory." This memory selectively caches the most informative historical frames based on a proposed evidence score mechanism, dynamically updating to retain a fixed number of frames. This approach prevents memory footprint growth and degradation over long sequences, all without requiring retraining, architectural modifications, or auxiliary losses for the underlying generator. Stream3D demonstrates superior performance against latent-transport baselines, including KV-cache reuse and flow-based feature editing, across both photometric and geometric metrics on realistic and synthetic streaming benchmarks.

Key takeaway

For Computer Vision Engineers developing real-time 3D reconstruction systems from video streams, Stream3D offers a critical solution to temporal inconsistency. You should consider integrating this training-free mechanism to adapt existing high-quality, view-conditioned 3D generators for streaming applications. This approach allows you to achieve consistent 3D outputs over long sequences without the need for costly retraining or architectural modifications, significantly improving visual quality and efficiency.

Key insights

Stream3D uses an evidential memory and evidence score mechanism to enable consistent 3D generation from sequential multi-view inputs without retraining existing models.

Principles

Temporal consistency is critical for streaming 3D generation.
Selective memory caching prevents degradation over long sequences.
Training-free adaptation preserves original generator quality.

Method

Stream3D maintains a compact evidential memory, selectively caching informative historical frames via an evidence score mechanism. This memory dynamically updates to a fixed size, converting frozen view-conditioned 3D generators into streaming ones without modification.

In practice

Adapt single-view 3D generators for video streams.
Manage temporal context with evidential memory.
Evaluate streaming 3D generation via photometric/geometric metrics.

Topics

3D Generation
Multi-View Synthesis
Temporal Consistency
Evidential Memory
Streaming Algorithms
Computer Vision

Best for: Research Scientist, AI Scientist, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.