Automating Crash Diagram Generation Using Vision-Language Models: A Case Study on Multi-Lane Roundabouts
Summary
A study investigated using Vision-Language Models (VLMs) to automate crash diagram generation from police reports, focusing on multilane roundabouts. Researchers developed a three-part structured prompt framework and a 10-metric evaluation system to assess diagram quality based on semantic accuracy, spatial fidelity, and visual clarity. Three models, GPT-4o, Gemini-1.5-Flash, and Janus-4o, were tested on 79 crash reports from two high-volume multilane roundabouts in New York State. GPT-4o achieved the highest average performance with a score of 6.29 out of 10, demonstrating superior spatial reasoning and alignment between extracted and visualized crash data. Gemini-1.5-Flash scored 5.28, and Janus-4o scored 3.64. The findings highlight VLMs' potential to improve efficiency and consistency in crash analysis but also reveal current limitations in achieving the precise spatial accuracy required for fully autonomous engineering-grade applications.
Key takeaway
For transportation safety professionals seeking to streamline crash documentation, integrating VLMs like GPT-4o can significantly reduce manual workload and enhance consistency. However, you must maintain human oversight for critical spatial precision and geometric accuracy, especially for operational or legal applications. Consider hybrid pipelines where language models extract information for existing professional diagramming tools.
Key insights
VLMs can automate crash diagram generation, with GPT-4o showing superior spatial reasoning and semantic accuracy.
Principles
- Structured prompts enhance VLM performance in complex tasks.
- Binary scoring ensures strict adherence to safety-critical standards.
Method
A three-part structured prompt guides VLMs through interpretation, extraction, and visual synthesis of crash reports. A 10-metric binary evaluation system assesses semantic accuracy, spatial fidelity, and visual clarity.
In practice
- Use GPT-4o for higher spatial reasoning in diagram generation.
- Implement structured prompts for consistent VLM outputs.
Topics
- Vision-Language Models
- Crash Diagram Automation
- Multi-lane Roundabouts
- GPT-4o Performance
- Traffic Safety Analysis
Best for: Machine Learning Engineer, Computer Vision Engineer, AI Scientist, AI Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.