DifferAD-R1: A Difference-Guided IndustrialAnomaly Localization with Multimodal LargeLanguage Models

2026-06-15 · Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision, Robotics & Autonomous Systems · Depth: Expert, quick

Summary

DifferAD-R1 is an MLLM-augmented reinforcement learning framework designed for industrial anomaly localization, specifically targeting the detection of unseen defect categories in industrial products. It addresses the limitations of traditional closed-set methods, which struggle with cross-scenario generalization, and existing MLLM-based approaches that use misaligned QA-style paradigms or ineffective optimization for subtle defects. DifferAD-R1 introduces a Difference-Guided dual-image paradigm, reframing localization as a one-shot difference grounding problem to explore cross-scenario anomalies. It also features a Dual-Consistency Localization Reward to improve optimization stability for hard-to-detect anomalies and integrates a difficulty-aware strategy with adaptive reweighting and group-wise resampling. For evaluation, the AD-DualDiff dataset was constructed, comprising 13K paired images across 20 categories. Experimental results show DifferAD-R1 significantly outperforms existing baselines and achieves competitive performance against large-scale models like Qwen3-VL (235B parameters).

Key takeaway

For Computer Vision Engineers developing industrial anomaly detection systems, DifferAD-R1 offers a robust approach to overcome generalization issues and detect subtle defects. You should consider its Difference-Guided dual-image paradigm and Dual-Consistency Localization Reward to enhance your models' performance on unseen defect categories. Its difficulty-aware strategy can also improve learning efficiency on challenging instances, potentially reducing false negatives in critical industrial applications.

Key insights

DifferAD-R1 uses a difference-guided MLLM-augmented RL framework to localize industrial anomalies, improving generalization and detection of subtle defects.

Principles

Reformulate localization as difference grounding.
Enhance optimization for subtle defects.
Prioritize learning on challenging instances.

Method

DifferAD-R1 employs a Difference-Guided dual-image paradigm, Dual-Consistency Localization Reward, and a difficulty-aware strategy with adaptive reweighting and group-wise resampling within an MLLM-augmented reinforcement learning framework.

In practice

One-shot difference grounding for anomalies.
Evaluate with 13K paired images.
Publicly available code for implementation.

Topics

Industrial Anomaly Localization
Multimodal Large Language Models
Reinforcement Learning
Computer Vision
Defect Detection
AD-DualDiff Dataset

Code references

Rong2026/work-1

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.