OmniTraffic: A Controllable Generation Pipeline and Benchmark for Spatio-Temporal Traffic Reasoning

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, quick

Summary

OmniTraffic is introduced as a controllable generation pipeline and benchmark designed for spatio-temporal traffic reasoning, addressing limitations in existing benchmarks that focus on passive visual recognition. It reconstructs 12 real-world intersections into editable 3D environments, supplemented by surveillance footage from two countries, to support both controlled and natural-condition evaluation. The benchmark defines a three-level task hierarchy covering scene perception, multi-view and temporal reasoning, and decision support. Utilizing structured traffic metadata, OmniTraffic generates 8M VQA samples and includes a 3K human-verified test set, covering vehicle states, lane functions, view-BEV correspondence, temporal dynamics, and signal-phase analysis. Evaluation of eleven frontier MLLMs revealed a substantial human-model gap, particularly in topology-grounded and spatio-temporal reasoning tasks. Fine-tuning a lightweight MLLM on simulated OmniTraffic data improved performance on real-world traffic scenes, demonstrating the value of simulation-generated supervision.

Key takeaway

For AI Scientists and Machine Learning Engineers developing MLLMs for autonomous driving or traffic management, this work highlights a critical gap: current models significantly underperform humans in spatio-temporal and topology-grounded traffic reasoning. You should integrate OmniTraffic into your evaluation pipelines to rigorously test model capabilities beyond basic recognition. Leverage its extensible pipeline and simulation-generated supervision to fine-tune models, addressing these specific reasoning deficiencies for more robust real-world deployment.

Key insights

OmniTraffic is a benchmark and pipeline for spatio-temporal traffic reasoning, revealing significant MLLM gaps in complex traffic understanding.

Principles

Method

OmniTraffic reconstructs 12 real-world intersections into editable 3D environments, generating multi-view VQA samples with structured metadata for diverse traffic scenarios and a three-level task hierarchy.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.