GeoRelight: Learning Joint Geometrical Relighting and Reconstruction with Flexible Multi-Modal Diffusion Transformers

2026-04-22 · Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, quick

Summary

GeoRelight is a novel Multi-Modal Diffusion Transformer designed to jointly solve the ill-posed task of relighting a person from a single 2D photo and reconstructing their 3D geometry. Traditional methods often struggle with error accumulation in sequential pipelines or lack physical consistency by not explicitly using 3D geometry. GeoRelight addresses these limitations through two main technical innovations: isotropic NDC-Orthographic Depth (iNOD), a distortion-free 3D representation compatible with latent diffusion models, and a strategic mixed-data training approach that integrates both synthetic and auto-labeled real data. This joint optimization of geometry and relighting enables GeoRelight to outperform prior sequential models and systems that did not leverage 3D geometry.

Key takeaway

For research scientists developing computer vision models for human relighting or 3D reconstruction, GeoRelight demonstrates that a unified, joint approach significantly improves performance and physical consistency. You should consider integrating 3D geometry explicitly into your relighting pipelines and explore distortion-free 3D representations like iNOD to enhance latent diffusion model compatibility. This method offers a path to overcome limitations of sequential processing and geometry-agnostic systems.

Key insights

Jointly solving 3D geometry and relighting from a single image improves physical consistency and performance.

Principles

3D geometry and relighting are mutually beneficial tasks.
Distortion-free 3D representations enhance latent diffusion models.

Method

GeoRelight uses a Multi-Modal Diffusion Transformer with isotropic NDC-Orthographic Depth (iNOD) and a mixed-data training strategy combining synthetic and auto-labeled real data to jointly reconstruct geometry and relight.

In practice

Use iNOD for 3D representation in latent diffusion.
Combine synthetic and real data for robust training.

Topics

Geometrical Relighting
3D Reconstruction
Multi-Modal Diffusion Transformers
Isotropic NDC-Orthographic Depth
Latent Diffusion Models

Best for: Research Scientist, AI Scientist, Computer Vision Engineer, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.