One-Shot Novel View and Pose Human Image Synthesis via 3D Prior Guided Diffusion Model

2026-06-18 · Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision & Pattern Recognition · Depth: Expert, quick

Summary

A novel approach for one-shot novel view and pose human image synthesis is proposed, utilizing a conditional denoising diffusion model. This model addresses limitations of existing 2D pose transfer and generalizable human NeRF methods by dividing the synthesis problem into a sequence of conditional denoising steps. It introduces 3D human priors, specifically a 3D normal map and a color prompt, as geometry and color conditions to generate humans with complex and arbitrary poses. This enables high-quality synthesis, including the accurate recovery of occluded or invisible human parts. Furthermore, a self-reconstruction based customized refinement enhances fine details when applied to novel persons. Experimental results on public datasets demonstrate significant performance improvements and better generalization ability compared to previous methods. The code will be publicly available at https://github.com/Yankeegsj/3DPGDM.

Key takeaway

For Computer Vision Engineers developing human image synthesis systems, this 3D prior-guided diffusion model offers a robust solution for one-shot novel view and pose generation, overcoming limitations of 2D pose transfer and generalizable NeRFs. You should explore its architecture, particularly the 3D normal map and color prompt integration, to achieve high-quality synthesis of complex poses and occluded parts, enhancing realism and generalization across diverse subjects.

Key insights

A 3D prior-guided conditional diffusion model enables high-quality one-shot novel view and pose human image synthesis.

Principles

Decompose complex synthesis into denoising steps.
Integrate 3D priors for robust geometry and color.
Refine novel subjects via self-reconstruction.

Method

A conditional denoising diffusion model synthesizes novel human views/poses by iteratively denoising, guided by 3D normal maps and color prompts, followed by self-reconstruction refinement for new subjects.

In practice

Generate complex human poses from single images.
Accurately recover occluded human body parts.
Enhance fine details for unseen individuals.

Topics

One-Shot Human Image Synthesis
Novel View Synthesis
Pose Transfer
Diffusion Models
3D Human Priors
Computer Vision

Code references

Yankeegsj/3DPGDM

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.