Geometry-Aware Cross Modal Alignment for Light Field-LiDAR Semantic Segmentation

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, extended

Summary

Researchers from Tsinghua University introduce TrafficScene, the first multimodal dataset for semantic segmentation that integrates light field images and LiDAR point clouds. This dataset, collected using a 3x3 camera array with a 30 cm baseline and CH128X1 LiDAR, comprises 5607 light field images and 623 point cloud frames, all with comprehensive semantic annotations across all viewpoints. Based on TrafficScene, the team proposes Mlpfseg, a novel multimodal fusion segmentation network. Mlpfseg incorporates a Point-Pixel Feature Fusion Module (PFFM) to address density mismatches between modalities and a Depth Difference Perception Module (DDPM) to enhance occlusion awareness. The method achieves state-of-the-art performance, outperforming image-only segmentation by 1.71 mIoU and point cloud-only segmentation by 2.38 mIoU, particularly improving segmentation of small and occluded objects.

Key takeaway

For research scientists developing autonomous driving perception systems, integrating light field imaging with LiDAR data offers a significant pathway to overcome challenges in occluded and small object segmentation. You should explore multimodal fusion networks like Mlpfseg, which leverage both spatial and angular information, and consider creating or utilizing datasets with comprehensive, multi-viewpoint annotations to maximize performance gains in complex urban environments.

Key insights

Integrating light field images and LiDAR point clouds significantly enhances semantic segmentation, especially for occluded objects.

Principles

Method

Mlpfseg fuses light field images and LiDAR point clouds using a Point-Pixel Feature Fusion Module for density matching and a Depth Difference Perception Module for occlusion awareness, segmenting both modalities simultaneously.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.