SemanticXR: Low Power and Real-time Queryable Semantic Mapping with an Object-Level Device-Cloud Architecture

· Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Emerging Technologies & Innovation · Depth: Expert, quick

Summary

SemanticXR is introduced as the first device-cloud system designed for real-time, open-vocabulary semantic mapping and querying, specifically addressing the power, bandwidth, and memory constraints of mobile Extended Reality (XR) devices. This system elevates semantically identifiable objects to first-class units for communication, execution, and memory across both device and server components. Its architecture employs object-level parallelism and geometry downsampling on the server to improve mapping latency by 2.2X, while depth-mapping co-design keeps upstream bandwidth under 2.5 Mbps. On the device, SemanticXR utilizes an object-level sparse local map with incremental updates, enabling sub-100 ms query latency for up to 10,000 objects, even with network drops. It supports tens of thousands of objects within 500 MB, scales downstream bandwidth with map changes, and adds only 2% device power during normal operation.

Key takeaway

For AI Engineers developing XR applications requiring real-time semantic mapping, SemanticXR's device-cloud architecture offers a robust solution to overcome mobile device constraints. You can achieve open-vocabulary mapping with sub-100 ms query latency and minimal power overhead (2%), even with large object counts and network instability. Consider adopting an object-level approach to manage communication, execution, and memory efficiently across your device and cloud components, ensuring scalable and responsive spatial intelligence.

Key insights

SemanticXR uses an object-level device-cloud architecture for low-power, real-time, queryable semantic mapping in XR.

Principles

Method

SemanticXR's method involves splitting semantic mapping across device and cloud, treating objects as primary units for communication and processing, and employing object-level parallelism, geometry downsampling, and depth-mapping co-design.

In practice

Topics

Best for: Computer Vision Engineer, Research Scientist, AI Scientist, AI Engineer, Robotics Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.