Camera and LiDAR BEV Fusion for Cooperative 3D Object Detection on TUMTraf V2X

2026-06-11 · Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, quick

Summary

A Camera and LiDAR fusion detector, designed for the DriveX 2026 challenge's TUMTraf V2X cooperative 3D object detection track, integrates three roadside cameras with an infrastructure-plus-vehicle point cloud in a shared bird's-eye-view space. This detector utilizes a CenterPoint-style head with a generalized IoU regression loss and an IoU quality re-ranking head for box prediction. The model achieved a 3D mAP of 0.85 on the public Codabench test split. Researchers discovered that 44 of the 50 test frames were present in the training (40) and validation (4) splits. Further studies showed that finetuning by oversampling these overlapping frames boosted performance to 0.89 mAP, while replacing predictions on these frames with ground truth reached 0.99 mAP. All three configurations and their per-class results are documented.

Key takeaway

For Machine Learning Engineers evaluating 3D object detection models for V2X applications, you must rigorously check test datasets for overlap with training or validation splits. Unidentified data leakage, as shown by the 0.85 mAP vs. 0.99 mAP with ground truth replacement, can severely misrepresent model performance. Ensure your benchmarks reflect true generalization capabilities by verifying dataset uniqueness, especially when integrating cooperative sensor fusion.

Key insights

Dataset overlap significantly inflates 3D object detection scores in cooperative BEV fusion models, as demonstrated on TUMTraf V2X.

Principles

Dataset overlap can artificially inflate benchmark scores.
Cooperative V2X fusion improves 3D object detection.
CenterPoint-style heads are effective for BEV prediction.

Method

Fuses three roadside cameras with infrastructure-plus-vehicle point cloud in BEV space, using a CenterPoint-style head with IoU regression and re-ranking.

In practice

Validate test sets for training data overlap.
Employ Camera-LiDAR fusion for V2X detection.
Use IoU re-ranking for improved box quality.

Topics

Cooperative 3D Object Detection
Camera-LiDAR Fusion
Bird's-Eye-View
Dataset Overlap
V2X Systems
CenterPoint Detector

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.