Camera and LiDAR BEV Fusion for Cooperative 3D Object Detection on TUMTraf V2X

· Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, quick

Summary

A Camera and LiDAR fusion detector, designed for the DriveX 2026 challenge's TUMTraf V2X cooperative 3D object detection track, integrates three roadside cameras with an infrastructure-plus-vehicle point cloud in a shared bird's-eye-view space. This detector utilizes a CenterPoint-style head with a generalized IoU regression loss and an IoU quality re-ranking head for box prediction. The model achieved a 3D mAP of 0.85 on the public Codabench test split. Researchers discovered that 44 of the 50 test frames were present in the training (40) and validation (4) splits. Further studies showed that finetuning by oversampling these overlapping frames boosted performance to 0.89 mAP, while replacing predictions on these frames with ground truth reached 0.99 mAP. All three configurations and their per-class results are documented.

Key takeaway

For Machine Learning Engineers evaluating 3D object detection models for V2X applications, you must rigorously check test datasets for overlap with training or validation splits. Unidentified data leakage, as shown by the 0.85 mAP vs. 0.99 mAP with ground truth replacement, can severely misrepresent model performance. Ensure your benchmarks reflect true generalization capabilities by verifying dataset uniqueness, especially when integrating cooperative sensor fusion.

Key insights

Dataset overlap significantly inflates 3D object detection scores in cooperative BEV fusion models, as demonstrated on TUMTraf V2X.

Principles

Method

Fuses three roadside cameras with infrastructure-plus-vehicle point cloud in BEV space, using a CenterPoint-style head with IoU regression and re-ranking.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.