[P] VeridisQuo - open-source deepfake detector that combines spatial + frequency analysis and shows you where the face was manipulated

2026-03-07 · Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy · Depth: Intermediate, quick

Summary

A university project developed a deepfake detection system that combines spatial and frequency domain analysis. The system utilizes a parallel architecture: an EfficientNet-B4 processes spatial features, while a dedicated frequency module performs FFT and DCT on input images, merging their outputs via an MLP. These two streams are then concatenated and fed into a classification MLP, resulting in a model with approximately 25 million parameters. A key feature is the integration of GradCAM, which generates heatmaps to visualize the model's detection focus, primarily around blending boundaries and jawlines. Trained on 716K face images from the FaceForensics++ (C23) dataset for 7 epochs, the model achieved ~96% accuracy on a test set of ~107K images, with very high recall for fakes but a ~7-8% false positive rate.

Key takeaway

For AI Scientists developing deepfake detection systems, integrating both spatial and frequency domain analysis can improve performance against sophisticated fakes. Your models should consider incorporating interpretability tools like GradCAM to understand detection mechanisms, especially when dealing with subtle artifacts. Be mindful of false positive rates, as this model showed a tendency to over-classify real videos as fake in real-world conditions.

Key insights

Combining spatial and frequency domain analysis enhances deepfake detection, especially for high-quality fakes.

Principles

Deepfake traces exist in both pixel and frequency domains.
Fusion of diverse feature types improves detection robustness.

Method

A two-stream architecture processes face crops in parallel using EfficientNet-B4 for spatial features and a frequency module (FFT, DCT) for spectral inconsistencies, followed by concatenation and MLP classification.

In practice

Use GradCAM for model interpretability in deepfake detection.
DCT features are effective for compression artifact detection.

Topics

Deepfake Detection
Multimodal Learning
EfficientNet
Frequency Domain Analysis
GradCAM

Code references

VeridisQuo-orga/VeridisQuo

Best for: Computer Vision Engineer, AI Scientist, Research Scientist, AI Engineer, Machine Learning Engineer, AI Student

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.