SynSur: An end-to-end generative pipeline for synthetic industrial surface defect generation and detection

2026-04-29 · Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision · Depth: Advanced, quick

Summary

A new end-to-end pipeline, SynSur, addresses the scarcity of labeled defect data in industrial defect detection by generating synthetic samples. This pipeline integrates Vision-Language-Model-based prompts, LoRA-adapted diffusion, mask-guided inpainting, and automatic label derivation with sample filtering. Evaluated on a challenging dataset of pitting defects on ball screw drives (BSData) and a subset of the Mobile phone screen surface defect segmentation dataset (MSD), SynSur demonstrates that synthetic data, while not a replacement for real data, can preserve performance and offer modest gains when combined with real datasets. The study also analyzes pipeline stages like prompt construction and sample filtering using DreamSim and CLIPScore, confirming the pipeline's transferability across domains with domain-specific adaptation.

Key takeaway

For research scientists developing industrial defect detection systems with limited real data, you should integrate synthetic data generation pipelines like SynSur to augment your training sets. While synthetic-only training is insufficient, combining it with real data can preserve or modestly improve detector performance, especially when adapting the pipeline to your specific domain and ensuring annotation quality.

Key insights

Synthetic data generation can augment scarce real datasets for industrial defect detection, improving model performance.

Principles

Synthetic data strengthens scarce real datasets.
Domain-specific adaptation is crucial for transferability.

Method

The SynSur pipeline combines VLM prompts, LoRA-adapted diffusion, mask-guided inpainting, and sample filtering with automatic label derivation to generate and annotate synthetic industrial defects.

In practice

Use DreamSim and CLIPScore for sample filtering.
Combine synthetic data with real data for best results.

Topics

SynSur Pipeline
Industrial Defect Detection
Synthetic Data Generation
Diffusion Models
Vision-Language Models

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.