Architectural Bias in Face Presentation Attack Detection: A Comparative Study of Vision Transformers and Convolutional Neural Networks

· Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy · Depth: Expert, quick

Summary

A comparative empirical investigation into Face Presentation Attack Detection (PAD) systems reveals that Vision Transformer architectures significantly reduce demographic bias compared to convolutional neural networks. Experiments on the CASIA-SURF Cross-Ethnicity Face Anti-Spoofing (CeFA) dataset evaluated a Multimodal ViT-Tiny, a ResNet18 CNN baseline, and a pretrained DeiT-S. DeiT-S achieved the highest overall accuracy of 97.27% and the lowest Equal Error Rate (EER) of 0.86%, surpassing ResNet18's 90.15% accuracy. Notably, DeiT-S reduced the inter-ethnic ACER gap between African and East Asian subjects to 0.13%, an 83% reduction from a reported 0.75%. Furthermore, DeiT-S demonstrated a 3.6x generalization advantage on zero-shot Central Asian subjects, maintaining 2.89% BPCER compared to ResNet18's 10.44%. These findings suggest that architectural design, particularly pretrained Vision Transformers, influences cross-demographic fairness in PAD systems.

Key takeaway

For AI Security Engineers designing or deploying face Presentation Attack Detection systems, you should prioritize pretrained Vision Transformer architectures. These models, like DeiT-S, demonstrate superior accuracy (97.27%) and significantly reduce demographic performance disparities, achieving an 83% reduction in inter-ethnic ACER gaps. Your systems will also benefit from 3.6x better generalization to unseen demographic groups, enhancing overall security and equity in biometric authentication.

Key insights

Pretrained Vision Transformers significantly reduce demographic bias and improve generalization in face Presentation Attack Detection.

Principles

Method

Conducted a comparative empirical investigation of Multimodal ViT-Tiny, ResNet18, and pretrained DeiT-S architectures on the CASIA-SURF CeFA dataset.

In practice

Topics

Best for: Research Scientist, AI Scientist, Computer Vision Engineer, AI Security Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.