An Analysis Focused on Womens Safety: Can VAD Models Be Enhanced by a Multi-modal Dataset?

· Source: cs.CV updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, extended

Summary

The ExtrAnom dataset is introduced to address the significant lack of resources for video anomaly detection (VAD) focused on women's safety. Existing VAD datasets, often high-resolution and well-lit, fail to represent women-centric anomalies like chain snatching, stalking, and inappropriate touch, especially in low-light or low-resolution surveillance footage. ExtrAnom comprises 1001 real-world videos (500 normal, 501 anomalous) categorized into 5 types of women-centric crimes, including 8% low-light, 13% low-resolution, and 15% long-shot videos. Each video includes one human-generated and three LLM-generated textual descriptions, enabling cross-modal and VLM-based validations. Benchmarking against popular VAD datasets and SOTA methods reveals that existing models perform poorly on women-centric anomalies, highlighting ExtrAnom's importance.

Key takeaway

For AI Scientists and Machine Learning Engineers developing surveillance systems, recognize that current VAD models, including SOTA multi-modal LLMs, significantly misclassify women-centric anomalies due to data limitations. You should prioritize training and fine-tuning models with specialized datasets like ExtrAnom, which includes diverse real-world conditions and detailed textual annotations, to improve accuracy in detecting critical events such as stalking and chain snatching. This will lead to more reliable public safety applications.

Key insights

Existing VAD models fail to detect women-centric anomalies due to inadequate, unrepresentative training data.

Principles

Method

ExtrAnom dataset creation involves collecting real-world videos from diverse sources, categorizing 5 women-centric crime types, and generating multi-modal textual annotations using human input and LLMs (ChatGPT, DeepSeek, Mistral).

In practice

Topics

Code references

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CV updates on arXiv.org.