An Analysis Focused on Womens Safety: Can VAD Models Be Enhanced by a Multi-modal Dataset?
Summary
The ExtrAnom dataset is introduced to address the significant lack of resources for video anomaly detection (VAD) focused on women's safety. Existing VAD datasets, often high-resolution and well-lit, fail to represent women-centric anomalies like chain snatching, stalking, and inappropriate touch, especially in low-light or low-resolution surveillance footage. ExtrAnom comprises 1001 real-world videos (500 normal, 501 anomalous) categorized into 5 types of women-centric crimes, including 8% low-light, 13% low-resolution, and 15% long-shot videos. Each video includes one human-generated and three LLM-generated textual descriptions, enabling cross-modal and VLM-based validations. Benchmarking against popular VAD datasets and SOTA methods reveals that existing models perform poorly on women-centric anomalies, highlighting ExtrAnom's importance.
Key takeaway
For AI Scientists and Machine Learning Engineers developing surveillance systems, recognize that current VAD models, including SOTA multi-modal LLMs, significantly misclassify women-centric anomalies due to data limitations. You should prioritize training and fine-tuning models with specialized datasets like ExtrAnom, which includes diverse real-world conditions and detailed textual annotations, to improve accuracy in detecting critical events such as stalking and chain snatching. This will lead to more reliable public safety applications.
Key insights
Existing VAD models fail to detect women-centric anomalies due to inadequate, unrepresentative training data.
Principles
- Real-world surveillance conditions (low-light, low-res) are critical for effective VAD.
- Multi-modal datasets with textual annotations enhance VLM performance for fine-grained anomaly detection.
Method
ExtrAnom dataset creation involves collecting real-world videos from diverse sources, categorizing 5 women-centric crime types, and generating multi-modal textual annotations using human input and LLMs (ChatGPT, DeepSeek, Mistral).
In practice
- Use ExtrAnom to train VLMs for detecting subtle women-centric crimes.
- Incorporate low-light and low-resolution video data for robust VAD model development.
Topics
- Video Anomaly Detection
- Women Safety
- Multi-modal LLMs
- ExtrAnom Dataset
- Surveillance Videos
- Vision Language Models
Code references
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CV updates on arXiv.org.