SemanticXR: Low Power and Real-time Queryable Semantic Mapping with an Object-Level Device-Cloud Architecture
Summary
SemanticXR is introduced as the first device-cloud system designed for real-time, open-vocabulary semantic mapping and querying, specifically addressing the power, bandwidth, and memory constraints of mobile Extended Reality (XR) devices. This system elevates semantically identifiable objects to first-class units for communication, execution, and memory across both device and server components. Its architecture employs object-level parallelism and geometry downsampling on the server to improve mapping latency by 2.2X, while depth-mapping co-design keeps upstream bandwidth under 2.5 Mbps. On the device, SemanticXR utilizes an object-level sparse local map with incremental updates, enabling sub-100 ms query latency for up to 10,000 objects, even with network drops. It supports tens of thousands of objects within 500 MB, scales downstream bandwidth with map changes, and adds only 2% device power during normal operation.
Key takeaway
For AI Engineers developing XR applications requiring real-time semantic mapping, SemanticXR's device-cloud architecture offers a robust solution to overcome mobile device constraints. You can achieve open-vocabulary mapping with sub-100 ms query latency and minimal power overhead (2%), even with large object counts and network instability. Consider adopting an object-level approach to manage communication, execution, and memory efficiently across your device and cloud components, ensuring scalable and responsive spatial intelligence.
Key insights
SemanticXR uses an object-level device-cloud architecture for low-power, real-time, queryable semantic mapping in XR.
Principles
- Elevate objects as first-class units.
- Use object-level parallelism for latency.
- Co-design depth-mapping for bandwidth.
Method
SemanticXR's method involves splitting semantic mapping across device and cloud, treating objects as primary units for communication and processing, and employing object-level parallelism, geometry downsampling, and depth-mapping co-design.
In practice
- Enable spatial object search in XR.
- Power AI assistants with grounded interactions.
- Support large-scale semantic maps on mobile.
Topics
- Extended Reality
- Semantic Mapping
- Device-Cloud Architecture
- Object-Level Processing
- Low-Power Computing
- Real-time Systems
Best for: Computer Vision Engineer, Research Scientist, AI Scientist, AI Engineer, Robotics Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.