Generalized Disguise Makeup Presentation Attack Detection Using an Attention-Guided Patch-Based Framework

2026-04-30 · Source: cs.CV updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy · Depth: Expert, extended

Summary

Researchers from Shahid Beheshti University propose a generalized framework for detecting disguise makeup presentation attacks, which are particularly challenging due to their realistic alteration of facial appearance using cosmetics and prosthetics. The framework employs a two-phase design: a style-invariant full-face model, trained with metric learning and enhanced by a whitening transformation, extracts region attention scores via Grad-CAM. These scores then guide a patch-based phase that performs localized analysis using region-specific subnetworks, also trained with metric learning, for fine-grained discrimination. The team also created a new, diverse dataset of live and disguise makeup faces under real-world conditions. Experimental results show strong generalization across both their collected dataset and SIW-Mv2, achieving 8.97% ACER and 9.76% EER on the collected dataset, and 0% ACER on Obfuscation and Impersonation attacks and 1.34% on Cosmetics attacks of SIW-Mv2, outperforming prior works.

Key takeaway

For research scientists developing robust face anti-spoofing systems, this framework demonstrates that combining global attention with localized, patch-based analysis and metric learning significantly improves detection of sophisticated disguise makeup attacks. You should consider integrating similar two-phase, attention-guided architectures and style-invariant feature extraction to enhance generalization across diverse and challenging presentation attack scenarios, especially those involving subtle or localized alterations.

Key insights

A two-phase, attention-guided, patch-based framework effectively detects realistic disguise makeup attacks.

Principles

Combine global context with local detail.
Metric learning improves embedding separability.
Style invariance enhances generalization.

Method

A full-face network generates attention scores via Grad-CAM, guiding patch-specific subnetworks for localized feature learning and classification, both utilizing metric learning and style-invariant features.

In practice

Use SPIGA for robust facial landmark detection.
MobileNetV2 offers efficient feature extraction.
Concatenate patch embeddings for final classification.

Topics

Disguise Makeup Detection
Face Anti-Spoofing
Presentation Attack Detection
Attention-Guided Learning
Patch-Based Framework

Code references

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CV updates on arXiv.org.