[P] VeridisQuo - open-source deepfake detector that combines spatial + frequency analysis and shows you where the face was manipulated
Summary
A university project developed a deepfake detection system that combines spatial and frequency domain analysis. The system utilizes a parallel architecture: an EfficientNet-B4 processes spatial features, while a dedicated frequency module performs FFT and DCT on input images, merging their outputs via an MLP. These two streams are then concatenated and fed into a classification MLP, resulting in a model with approximately 25 million parameters. A key feature is the integration of GradCAM, which generates heatmaps to visualize the model's detection focus, primarily around blending boundaries and jawlines. Trained on 716K face images from the FaceForensics++ (C23) dataset for 7 epochs, the model achieved ~96% accuracy on a test set of ~107K images, with very high recall for fakes but a ~7-8% false positive rate.
Key takeaway
For AI Scientists developing deepfake detection systems, integrating both spatial and frequency domain analysis can improve performance against sophisticated fakes. Your models should consider incorporating interpretability tools like GradCAM to understand detection mechanisms, especially when dealing with subtle artifacts. Be mindful of false positive rates, as this model showed a tendency to over-classify real videos as fake in real-world conditions.
Key insights
Combining spatial and frequency domain analysis enhances deepfake detection, especially for high-quality fakes.
Principles
- Deepfake traces exist in both pixel and frequency domains.
- Fusion of diverse feature types improves detection robustness.
Method
A two-stream architecture processes face crops in parallel using EfficientNet-B4 for spatial features and a frequency module (FFT, DCT) for spectral inconsistencies, followed by concatenation and MLP classification.
In practice
- Use GradCAM for model interpretability in deepfake detection.
- DCT features are effective for compression artifact detection.
Topics
- Deepfake Detection
- Multimodal Learning
- EfficientNet
- Frequency Domain Analysis
- GradCAM
Code references
Best for: Computer Vision Engineer, AI Scientist, Research Scientist, AI Engineer, Machine Learning Engineer, AI Student
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.