Venus-DeFakerOne: Unified Fake Image Detection & Localization
Summary
DeFakerOne is a novel, data-centric foundation model designed for unified fake image detection and localization (FIDL), addressing the fragmentation in existing research despite the convergence of image forgery techniques. Released on May 15, 2026, DeFakerOne integrates InternVL2 and SAM2 architectures, enabling simultaneous image-level detection and pixel-level localization across diverse scenarios including AIGC, DeepFake, document, and natural image manipulations. The model was trained on a curated dataset of 12.5 million samples, covering various forensic domains and incorporating a closed-loop data generation pipeline for continuous adaptation. Extensive experiments demonstrate DeFakerOne's state-of-the-art performance, outperforming baselines on 39 forgery detection benchmarks and 9 localization benchmarks. It also exhibits superior robustness against real-world perturbations and advanced generators like GPT-Image-2, achieving 95.77% accuracy on the challenging GPT-Image-2-Bench.
Key takeaway
For Computer Vision Engineers and Research Scientists developing robust anti-forgery systems, DeFakerOne's unified approach highlights that simply increasing data volume is insufficient. You should prioritize balanced, operation-aware data composition and multi-granularity supervision, especially pixel-level masks for fine-grained manipulations. Ensure your visual backbones preserve high-resolution local evidence, as stronger compression in newer VLMs can dilute critical forensic artifacts, impacting detection and localization accuracy against advanced generative models like GPT-Image-2.
Key insights
Unified fake image detection and localization requires balanced multi-domain data and fine-grained supervision.
Principles
- Data scaling alone does not guarantee FIDL performance.
- Cross-domain transfer is driven by operation-level artifact similarity.
- Preserve original resolution artifacts for crucial forensic evidence.
Method
DeFakerOne uses an MLLM-based perception-and-detection module (InternVL2) cascaded with a SAM2-based segmentation module. It employs dynamic VQA templates for detection and generates segmentation tokens for pixel-level localization.
In practice
- Curate diverse datasets covering multiple forgery domains.
- Implement closed-loop data generation for continuous model adaptation.
- Utilize joint classification and segmentation training for local manipulations.
Topics
- Unified FIDL
- DeFakerOne
- InternVL2-SAM2 Architecture
- Generative AI Forgery
- Cross-Domain Artifacts
Code references
Best for: Computer Vision Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, AI Security Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CV updates on arXiv.org.