Automating Crash Diagram Generation Using Vision-Language Models: A Case Study on Multi-Lane Roundabouts

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Advanced, extended

Summary

A study investigated using Vision-Language Models (VLMs) to automate crash diagram generation from police reports, focusing on multilane roundabouts. Researchers developed a three-part structured prompt framework and a 10-metric evaluation system to assess diagram quality based on semantic accuracy, spatial fidelity, and visual clarity. Three models, GPT-4o, Gemini-1.5-Flash, and Janus-4o, were tested on 79 crash reports from two high-volume multilane roundabouts in New York State. GPT-4o achieved the highest average performance with a score of 6.29 out of 10, demonstrating superior spatial reasoning and alignment between extracted and visualized crash data. Gemini-1.5-Flash scored 5.28, and Janus-4o scored 3.64. The findings highlight VLMs' potential to improve efficiency and consistency in crash analysis but also reveal current limitations in achieving the precise spatial accuracy required for fully autonomous engineering-grade applications.

Key takeaway

For transportation safety professionals seeking to streamline crash documentation, integrating VLMs like GPT-4o can significantly reduce manual workload and enhance consistency. However, you must maintain human oversight for critical spatial precision and geometric accuracy, especially for operational or legal applications. Consider hybrid pipelines where language models extract information for existing professional diagramming tools.

Key insights

VLMs can automate crash diagram generation, with GPT-4o showing superior spatial reasoning and semantic accuracy.

Principles

Method

A three-part structured prompt guides VLMs through interpretation, extraction, and visual synthesis of crash reports. A 10-metric binary evaluation system assesses semantic accuracy, spatial fidelity, and visual clarity.

In practice

Topics

Best for: Machine Learning Engineer, Computer Vision Engineer, AI Scientist, AI Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.