Camera and LiDAR BEV Fusion for Cooperative 3D Object Detection on TUMTraf V2X
Summary
A Camera and LiDAR fusion detector, designed for the DriveX 2026 challenge's TUMTraf V2X cooperative 3D object detection track, integrates three roadside cameras with an infrastructure-plus-vehicle point cloud in a shared bird's-eye-view space. This detector utilizes a CenterPoint-style head with a generalized IoU regression loss and an IoU quality re-ranking head for box prediction. The model achieved a 3D mAP of 0.85 on the public Codabench test split. Researchers discovered that 44 of the 50 test frames were present in the training (40) and validation (4) splits. Further studies showed that finetuning by oversampling these overlapping frames boosted performance to 0.89 mAP, while replacing predictions on these frames with ground truth reached 0.99 mAP. All three configurations and their per-class results are documented.
Key takeaway
For Machine Learning Engineers evaluating 3D object detection models for V2X applications, you must rigorously check test datasets for overlap with training or validation splits. Unidentified data leakage, as shown by the 0.85 mAP vs. 0.99 mAP with ground truth replacement, can severely misrepresent model performance. Ensure your benchmarks reflect true generalization capabilities by verifying dataset uniqueness, especially when integrating cooperative sensor fusion.
Key insights
Dataset overlap significantly inflates 3D object detection scores in cooperative BEV fusion models, as demonstrated on TUMTraf V2X.
Principles
- Dataset overlap can artificially inflate benchmark scores.
- Cooperative V2X fusion improves 3D object detection.
- CenterPoint-style heads are effective for BEV prediction.
Method
Fuses three roadside cameras with infrastructure-plus-vehicle point cloud in BEV space, using a CenterPoint-style head with IoU regression and re-ranking.
In practice
- Validate test sets for training data overlap.
- Employ Camera-LiDAR fusion for V2X detection.
- Use IoU re-ranking for improved box quality.
Topics
- Cooperative 3D Object Detection
- Camera-LiDAR Fusion
- Bird's-Eye-View
- Dataset Overlap
- V2X Systems
- CenterPoint Detector
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.