DRIFT: From Robustness Gaps to Invariance Manifolds for AI-Generated Image Detection

· Source: cs.CV updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision & Image Processing · Depth: Expert, extended

Summary

DRIFT is a novel AI-generated image detection framework developed by Samsung Research Institute that addresses limitations of existing training-free methods by learning a structured invariance manifold of real images. It utilizes a frozen DINOv2 ViT-B/14 backbone with two lightweight projection heads (robust and fragile) to decompose representation space. The robust subspace suppresses variations from physically plausible transformations, while the fragile subspace retains sensitivity to edit-like perturbations. A structured ordering margin of γ=0.3 enforces hierarchical separation, enabling detection as a margin-violation test. The framework incorporates an EMA teacher with momentum 0.996 and a reconstruction anchor (weight 0.1) to stabilize training on real-only data. Experiments show strong open-world generalization, achieving a mean ACC/AP of approximately 97.8/99.8 on ForenSynth, consistently high accuracy on Diffusion-6cls, and 93.2% ACC / 92.0% AP on Gemini and 94.8% ACC / 95.0% AP on ChatGPT for PromptWorld-1K. It also provides interpretable patch-wise localization heatmaps.

Key takeaway

For Machine Learning Engineers developing robust AI-generated image detectors, you should consider adopting a structured invariance learning approach. This method, which explicitly models real-image manifolds using robust and fragile representation subspaces, offers superior open-world generalization compared to fixed robustness gap techniques. Implement an EMA teacher and reconstruction anchor to stabilize training on real-only datasets, and utilize patch-wise drift maps for both detection and interpretable localization of synthetic content.

Key insights

AI-generated image detection improves by learning a structured invariance manifold of real images using robust and fragile representation subspaces.

Principles

Method

Train projection heads on a frozen VFM using real-only data, enforcing robust invariance, fragile sensitivity, and an ordering margin with EMA and reconstruction losses. Detect fakes via margin violation.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CV updates on arXiv.org.