Balancing Real and Synthetic Data for CNN-based Masonry Crack Detection

· Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision & Pattern Recognition · Depth: Intermediate, quick

Summary

Research on CNN-based masonry crack detection addresses the challenge of limited real-world data by exploring synthetic data generation. The study utilized a real dataset of masonry crack images from Bologna and a synthetic dataset created with a crack overlay tool. After identifying InceptionV4 as the best-performing model through initial training on real data, six training scenarios were tested, varying the ratio of real and synthetic data. Evaluation on a real image test set using F1-score and mean Intersection over Union (mIoU) metrics revealed that training with synthetic data plus a modest 20% real data achieved results comparable to, and in one case, outperformed, training solely on real data. Specifically, a 20% synthetic/80% real data scenario yielded a 76% F1-score and 80% mIoU.

Key takeaway

For Machine Learning Engineers developing CNN models for structural health monitoring, particularly masonry crack detection, you can significantly reduce the burden of collecting extensive real-world data. Consider integrating synthetically generated crack images, as a scenario with 20% real data combined with synthetic data achieved comparable or superior performance (76% F1-score, 80% mIoU) to using real data alone. This approach offers a practical path to robust models with fewer real-world data constraints.

Key insights

Synthetic data can significantly reduce real data collection needs for CNN-based crack detection while maintaining or improving accuracy.

Principles

Method

Train deep learning architectures on real data to select the best model (e.g., InceptionV4), then test varying real/synthetic data ratios, evaluating with F1-score and mIoU metrics.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.