LEXIS: LatEnt ProXimal Interaction Signatures for 3D HOI from an Image

2026-04-22 · Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, quick

Summary

LEXIS-Flow is a novel diffusion framework designed to reconstruct 3D Human-Object Interaction (HOI) from a single RGB image, addressing the challenge of capturing subtle physical coupling between bodies and objects. Unlike existing methods that use sparse, binary contact cues, LEXIS-Flow employs InterFields, a representation encoding dense, continuous proximity across entire body and object surfaces. To overcome the ill-posed nature of inferring these fields from single images, the framework utilizes LEXIS, a discrete manifold of interaction signatures learned through a VQ-VAE. LEXIS-Flow leverages these signatures to estimate human and object meshes along with their InterFields, which guide a refinement process to ensure physically plausible, proximity-aware reconstructions without post-hoc optimization. Evaluated on Open3DHOI and BEHAVE datasets, LEXIS-Flow significantly outperforms current baselines in reconstruction, contact, and proximity quality, enhancing generalization and perceived realism.

Key takeaway

For research scientists developing perceptive systems, LEXIS-Flow offers a robust approach to 3D HOI reconstruction. You should consider integrating dense, continuous proximity representations like InterFields and learned interaction signatures to achieve more physically plausible and realistic scene understanding, moving beyond sparse contact cues.

Key insights

LEXIS-Flow reconstructs 3D human-object interactions from images using dense proximity fields and learned interaction signatures.

Principles

Dense proximity fields improve HOI realism.
Interaction patterns are structured by action and object geometry.

Method

LEXIS-Flow uses a VQ-VAE to learn discrete interaction signatures (LEXIS), then a diffusion framework estimates human/object meshes and InterFields for guided, proximity-aware reconstruction.

In practice

Use InterFields for continuous proximity modeling.
Employ VQ-VAE for learning interaction signatures.

Topics

3D Human-Object Interaction
Latent ProXimal Interaction Signatures
InterFields Representation
VQ-VAE
Diffusion Models

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.