Geometry-Aware Cross Modal Alignment for Light Field-LiDAR Semantic Segmentation

2025-07-12 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, extended

Summary

Researchers from Tsinghua University introduce TrafficScene, the first multimodal dataset for semantic segmentation that integrates light field images and LiDAR point clouds. This dataset, collected using a 3x3 camera array with a 30 cm baseline and CH128X1 LiDAR, comprises 5607 light field images and 623 point cloud frames, all with comprehensive semantic annotations across all viewpoints. Based on TrafficScene, the team proposes Mlpfseg, a novel multimodal fusion segmentation network. Mlpfseg incorporates a Point-Pixel Feature Fusion Module (PFFM) to address density mismatches between modalities and a Depth Difference Perception Module (DDPM) to enhance occlusion awareness. The method achieves state-of-the-art performance, outperforming image-only segmentation by 1.71 mIoU and point cloud-only segmentation by 2.38 mIoU, particularly improving segmentation of small and occluded objects.

Key takeaway

For research scientists developing autonomous driving perception systems, integrating light field imaging with LiDAR data offers a significant pathway to overcome challenges in occluded and small object segmentation. You should explore multimodal fusion networks like Mlpfseg, which leverage both spatial and angular information, and consider creating or utilizing datasets with comprehensive, multi-viewpoint annotations to maximize performance gains in complex urban environments.

Key insights

Integrating light field images and LiDAR point clouds significantly enhances semantic segmentation, especially for occluded objects.

Principles

Multimodal data fusion improves perception.
Depth cues are critical for occlusion awareness.
Annotating all light field viewpoints is crucial.

Method

Mlpfseg fuses light field images and LiDAR point clouds using a Point-Pixel Feature Fusion Module for density matching and a Depth Difference Perception Module for occlusion awareness, segmenting both modalities simultaneously.

In practice

Utilize light field data for enhanced angular information.
Employ depth difference perception to identify occluded regions.
Develop datasets with full-viewpoint annotations.

Topics

Light Field-LiDAR Fusion
Semantic Segmentation
TrafficScene Dataset
Mlpfseg Network
Occlusion Perception

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.