Generalized Disguise Makeup Presentation Attack Detection Using an Attention-Guided Patch-Based Framework
Summary
Researchers from Shahid Beheshti University propose a generalized framework for detecting disguise makeup presentation attacks, which are particularly challenging due to their realistic alteration of facial appearance using cosmetics and prosthetics. The framework employs a two-phase design: a style-invariant full-face model, trained with metric learning and enhanced by a whitening transformation, extracts region attention scores via Grad-CAM. These scores then guide a patch-based phase that performs localized analysis using region-specific subnetworks, also trained with metric learning, for fine-grained discrimination. The team also created a new, diverse dataset of live and disguise makeup faces under real-world conditions. Experimental results show strong generalization across both their collected dataset and SIW-Mv2, achieving 8.97% ACER and 9.76% EER on the collected dataset, and 0% ACER on Obfuscation and Impersonation attacks and 1.34% on Cosmetics attacks of SIW-Mv2, outperforming prior works.
Key takeaway
For research scientists developing robust face anti-spoofing systems, this framework demonstrates that combining global attention with localized, patch-based analysis and metric learning significantly improves detection of sophisticated disguise makeup attacks. You should consider integrating similar two-phase, attention-guided architectures and style-invariant feature extraction to enhance generalization across diverse and challenging presentation attack scenarios, especially those involving subtle or localized alterations.
Key insights
A two-phase, attention-guided, patch-based framework effectively detects realistic disguise makeup attacks.
Principles
- Combine global context with local detail.
- Metric learning improves embedding separability.
- Style invariance enhances generalization.
Method
A full-face network generates attention scores via Grad-CAM, guiding patch-specific subnetworks for localized feature learning and classification, both utilizing metric learning and style-invariant features.
In practice
- Use SPIGA for robust facial landmark detection.
- MobileNetV2 offers efficient feature extraction.
- Concatenate patch embeddings for final classification.
Topics
- Disguise Makeup Detection
- Face Anti-Spoofing
- Presentation Attack Detection
- Attention-Guided Learning
- Patch-Based Framework
Code references
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CV updates on arXiv.org.