Architectural Bias in Face Presentation Attack Detection: A Comparative Study of Vision Transformers and Convolutional Neural Networks
Summary
A comparative empirical investigation into Face Presentation Attack Detection (PAD) systems reveals that Vision Transformer architectures significantly reduce demographic bias compared to convolutional neural networks. Experiments on the CASIA-SURF Cross-Ethnicity Face Anti-Spoofing (CeFA) dataset evaluated a Multimodal ViT-Tiny, a ResNet18 CNN baseline, and a pretrained DeiT-S. DeiT-S achieved the highest overall accuracy of 97.27% and the lowest Equal Error Rate (EER) of 0.86%, surpassing ResNet18's 90.15% accuracy. Notably, DeiT-S reduced the inter-ethnic ACER gap between African and East Asian subjects to 0.13%, an 83% reduction from a reported 0.75%. Furthermore, DeiT-S demonstrated a 3.6x generalization advantage on zero-shot Central Asian subjects, maintaining 2.89% BPCER compared to ResNet18's 10.44%. These findings suggest that architectural design, particularly pretrained Vision Transformers, influences cross-demographic fairness in PAD systems.
Key takeaway
For AI Security Engineers designing or deploying face Presentation Attack Detection systems, you should prioritize pretrained Vision Transformer architectures. These models, like DeiT-S, demonstrate superior accuracy (97.27%) and significantly reduce demographic performance disparities, achieving an 83% reduction in inter-ethnic ACER gaps. Your systems will also benefit from 3.6x better generalization to unseen demographic groups, enhancing overall security and equity in biometric authentication.
Key insights
Pretrained Vision Transformers significantly reduce demographic bias and improve generalization in face Presentation Attack Detection.
Principles
- Pretrained Vision Transformers achieve superior PAD accuracy.
- They produce smaller demographic performance gaps.
- They generalize more equitably across unseen groups.
Method
Conducted a comparative empirical investigation of Multimodal ViT-Tiny, ResNet18, and pretrained DeiT-S architectures on the CASIA-SURF CeFA dataset.
In practice
- Consider pretrained Vision Transformers for PAD systems.
- Evaluate PAD systems for inter-ethnic ACER gaps.
Topics
- Face Presentation Attack Detection
- Vision Transformers
- Demographic Bias
- Biometric Security
- DeiT-S
- ResNet18
Best for: Research Scientist, AI Scientist, Computer Vision Engineer, AI Security Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.