Balancing Real and Synthetic Data for CNN-based Masonry Crack Detection
Summary
Research on CNN-based masonry crack detection addresses the challenge of limited real-world data by exploring synthetic data generation. The study utilized a real dataset of masonry crack images from Bologna and a synthetic dataset created with a crack overlay tool. After identifying InceptionV4 as the best-performing model through initial training on real data, six training scenarios were tested, varying the ratio of real and synthetic data. Evaluation on a real image test set using F1-score and mean Intersection over Union (mIoU) metrics revealed that training with synthetic data plus a modest 20% real data achieved results comparable to, and in one case, outperformed, training solely on real data. Specifically, a 20% synthetic/80% real data scenario yielded a 76% F1-score and 80% mIoU.
Key takeaway
For Machine Learning Engineers developing CNN models for structural health monitoring, particularly masonry crack detection, you can significantly reduce the burden of collecting extensive real-world data. Consider integrating synthetically generated crack images, as a scenario with 20% real data combined with synthetic data achieved comparable or superior performance (76% F1-score, 80% mIoU) to using real data alone. This approach offers a practical path to robust models with fewer real-world data constraints.
Key insights
Synthetic data can significantly reduce real data collection needs for CNN-based crack detection while maintaining or improving accuracy.
Principles
- CNN performance depends on large, diverse datasets.
- Synthetic data effectively complements real-world data.
- Optimal data ratios enhance model accuracy.
Method
Train deep learning architectures on real data to select the best model (e.g., InceptionV4), then test varying real/synthetic data ratios, evaluating with F1-score and mIoU metrics.
In practice
- Generate synthetic cracks using overlay tools.
- Test a 20% real / 80% synthetic data mix.
- Evaluate crack detection with F1-score and mIoU.
Topics
- CNNs
- Synthetic Data
- Crack Detection
- Masonry Inspection
- Deep Learning
- Computer Vision
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.