Multi-Frequency Fusion for Robust Video Face Forgery Detection

· Source: Apple Machine Learning Research · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision · Depth: Advanced, quick

Summary

Researchers have developed two novel face video forgery detectors, LFWS and LFWL, which achieve high accuracy with significantly smaller model sizes compared to existing wide or dual-stream backbone methods. Both detectors are built upon the Xception baseline model, which has 21.9 million parameters. LFWS integrates a low-frequency Wavelet-Denoised Feature (WDF) with a phase-only Spatial-Phase Shallow Learning (SPSL) map, while LFWL merges WDF with Local Binary Patterns (LBP). This lightweight fusion is achieved through an additional 1x1 convolution module, adding only 292 parameters and maintaining the total parameter count at 21.9 million, demonstrating improved efficiency.

Key takeaway

For AI scientists developing real-time deepfake detection systems, consider integrating lightweight fusion modules that combine handcrafted features like Wavelet-Denoised Features (WDF) with Spatial-Phase Shallow Learning (SPSL) or Local Binary Patterns (LBP). This approach can yield higher accuracy with minimal parameter overhead, allowing your models to run more efficiently on resource-constrained platforms without sacrificing detection performance.

Key insights

Lightweight fusion of handcrafted cues can significantly improve face forgery detection accuracy with minimal model overhead.

Principles

Method

Combine Wavelet-Denoised Features (WDF) with either Spatial-Phase Shallow Learning (SPSL) or Local Binary Patterns (LBP) using a 1x1 convolution.

In practice

Topics

Best for: AI Scientist, Research Scientist, AI Researcher, Deep Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Apple Machine Learning Research.