Leveraging Large Vision Model for Multi-UAV Co-perception in Low-Altitude Wireless Networks

· Source: cs.CV updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Internet of Things (IoT) & Connected Devices · Depth: Advanced, extended

Summary

The Base-Station-Helped UAV (BHU) framework is proposed to enhance multi-UAV cooperative perception in low-altitude wireless networks by addressing challenges of massive visual data and communication latency. The framework employs a Top-K selection mechanism to identify and transmit only the most informative pixels from UAV-captured RGB images, significantly reducing data volume and latency. These sparsified images are sent to a ground server via multi-user MIMO (MU-MIMO), where a Swin-large-based MaskDINO encoder extracts Bird's-Eye-View (BEV) features for cooperative fusion and ground vehicle perception. A diffusion model-based deep reinforcement learning (DRL) algorithm jointly optimizes cooperative UAV selection, sparsification ratios, and precoding matrices. Simulations on the Air-Co-Pred dataset demonstrate that BHU improves perception performance by over 5% and reduces communication overhead by 85% compared to traditional CNN-based BEV fusion baselines, offering an effective solution for resource-constrained environments.

Key takeaway

For AI Scientists and Computer Vision Engineers developing multi-UAV perception systems, the BHU framework offers a robust approach to overcome communication bottlenecks. By adopting Top-K sparsification and LVM-based BEV fusion, your systems can achieve over 5% better perception accuracy with an 85% reduction in communication overhead. Consider integrating DDIM-based DRL for dynamic optimization of UAV selection, sparsification ratios, and precoding to maximize utility in resource-constrained low-altitude networks.

Key insights

A novel framework optimizes multi-UAV perception by sparsifying visual data and using LVMs with DRL for efficient communication.

Principles

Method

The BHU framework uses Top-K pixel selection, transmits sparsified images via MU-MIMO to a ground server, extracts BEV features with a Swin-large MaskDINO encoder, and fuses them. A DDIM-based DRL algorithm optimizes UAV selection, sparsification, and precoding.

In practice

Topics

Best for: Computer Vision Engineer, AI Scientist, Research Scientist, AI Researcher, AI Engineer, Robotics Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CV updates on arXiv.org.