Classifying by Proxy: Explainable and Reproducible Ensemble of Proxy Tasks for Child Sexual Abuse Imagery Classification
Summary
Child Sexual Abuse Imagery (CSAI) classification systems are crucial for law enforcement and content removal, yet their development faces significant hurdles. Research is hampered by highly sensitive data, restrictive access regimes, and a lack of model explainability, making studies hard to reproduce, distribute, compare, or validate. A novel approach applies an ensemble of Proxy Tasks, which are tasks correlating to CSAI classification, directly to real CSAI for the first time. This method, featuring a new selection of relevant Proxy Tasks and training adaptations, significantly improves reproducibility, explainability, and security for system distribution. The final model achieves competitive results, demonstrating 91.9% balanced accuracy on the RCPD dataset. This ensemble also surpasses the best-in-class DINO representation learning model in accuracy and uniquely provides classification explanations, a feature often missing in single deep learning models.
Key takeaway
For AI Scientists developing sensitive content classification systems, particularly for law enforcement, you should prioritize explainable and reproducible methods. This research demonstrates that an ensemble of Proxy Tasks can achieve high accuracy (91.9% on RCPD) while providing crucial classification explanations, a significant advantage over traditional deep learning models like DINO. Consider integrating proxy task ensembles into your development pipeline to address data access restrictions and meet operational demands for transparency and validation.
Key insights
An ensemble of Proxy Tasks enhances CSAI classification with improved explainability, reproducibility, and security.
Principles
- Proxy tasks can overcome data access limitations.
- Ensemble methods improve model explainability.
- Explainability is critical for law enforcement applications.
Method
The method involves selecting relevant Proxy Tasks from CSAI literature, adapting an original framework, and ensembling them for classification on real CSAI data.
In practice
- Explore proxy tasks for sensitive data classification.
- Implement ensemble learning for explainable AI.
- Prioritize explainability in forensic AI tools.
Topics
- Child Sexual Abuse Imagery
- Proxy Tasks
- Ensemble Learning
- Explainable AI
- Reproducibility
- Computer Vision
Best for: Research Scientist, AI Scientist, Computer Vision Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.