Direct 3D-Aware Object Insertion via Decomposed Visual Proxies

2026-06-04 · Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision · Depth: Expert, quick

Summary

DIRECT (Decomposed Injection for Reference Composition and Target-integration) is a novel framework for 3D-aware object insertion, addressing the limitations of current diffusion-based methods that lack explicit 3D pose control. Published on 2026-06-04, DIRECT integrates interactive pose manipulation with high-fidelity 2D image synthesis, enabling users to control an object's 3D pose during insertion. The method decomposes insertion conditions into three distinct components: appearance guidance from the reference object, geometry guidance from a user-adjusted 3D proxy, and context guidance from the target background. These are injected through separate pathways to prevent feature entanglement, ensuring reference appearance preservation, accurate pose adherence, and seamless scene adaptation. Additionally, DIRECT incorporates an automated data construction pipeline to enhance training data diversity and quality. Experiments demonstrate its superior performance in both geometric controllability and visual quality compared to prior approaches.

Key takeaway

For computer vision engineers or 3D artists needing precise control over object placement in image synthesis, DIRECT offers a significant advancement. If your current diffusion-based insertion methods lack explicit 3D pose manipulation, you should explore this framework. It allows you to interactively adjust 3D proxies, ensuring objects are composited with exact pose and appearance preservation, overcoming the limitations of 2D inpainting. This could streamline workflows for virtual try-on, scene generation, or product visualization.

Key insights

DIRECT enables 3D-aware object insertion by decomposing guidance into appearance, geometry, and context, injected separately for precise control.

Principles

Decomposing conditions avoids feature entanglement.
Separate injection pathways preserve distinct attributes.
User-adjusted 3D proxies enable explicit pose control.

Method

DIRECT integrates interactive pose manipulation with 2D image synthesis. It decomposes insertion conditions into appearance, geometry, and context guidance, injecting them via separate pathways. An automated data construction pipeline improves training.

In practice

Insert objects with explicit 3D pose control.
Preserve object appearance during scene integration.
Adapt objects seamlessly to target backgrounds.

Topics

3D Object Insertion
Diffusion Models
Pose Control
Image Synthesis
Computer Vision
Geometric Controllability

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.