LEXIS: LatEnt ProXimal Interaction Signatures for 3D HOI from an Image
Summary
LEXIS-Flow is a novel diffusion framework designed to reconstruct 3D Human-Object Interaction (HOI) from a single RGB image, addressing the challenge of capturing subtle physical coupling between bodies and objects. Unlike existing methods that use sparse, binary contact cues, LEXIS-Flow employs InterFields, a representation encoding dense, continuous proximity across entire body and object surfaces. To overcome the ill-posed nature of inferring these fields from single images, the framework utilizes LEXIS, a discrete manifold of interaction signatures learned through a VQ-VAE. LEXIS-Flow leverages these signatures to estimate human and object meshes along with their InterFields, which guide a refinement process to ensure physically plausible, proximity-aware reconstructions without post-hoc optimization. Evaluated on Open3DHOI and BEHAVE datasets, LEXIS-Flow significantly outperforms current baselines in reconstruction, contact, and proximity quality, enhancing generalization and perceived realism.
Key takeaway
For research scientists developing perceptive systems, LEXIS-Flow offers a robust approach to 3D HOI reconstruction. You should consider integrating dense, continuous proximity representations like InterFields and learned interaction signatures to achieve more physically plausible and realistic scene understanding, moving beyond sparse contact cues.
Key insights
LEXIS-Flow reconstructs 3D human-object interactions from images using dense proximity fields and learned interaction signatures.
Principles
- Dense proximity fields improve HOI realism.
- Interaction patterns are structured by action and object geometry.
Method
LEXIS-Flow uses a VQ-VAE to learn discrete interaction signatures (LEXIS), then a diffusion framework estimates human/object meshes and InterFields for guided, proximity-aware reconstruction.
In practice
- Use InterFields for continuous proximity modeling.
- Employ VQ-VAE for learning interaction signatures.
Topics
- 3D Human-Object Interaction
- Latent ProXimal Interaction Signatures
- InterFields Representation
- VQ-VAE
- Diffusion Models
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.