Bridging the Sim-to-Real Gap in Semiconductor Visual Program Synthesis via Input Binarization

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Quality Control & Standards · Depth: Expert, quick

Summary

A visual program synthesis framework is proposed for semiconductor inspection, addressing the challenge of obtaining sufficient real training data while ensuring nanometer-scale geometric accuracy. This framework utilizes a Vision-Language Model (VLM) to convert inspection images into editable Domain-Specific Language (DSL) code, enabling precise control over generated training data. To overcome the domain gap when the VLM, trained on synthetic DSL-rendered data, processes real Scanning Electron Microscope (SEM) images, an input binarization strategy is introduced. This method strips SEM-specific texture and noise, allowing the model to focus on geometric structure. On the MIIC dataset, binarized inputs improved the mean Dice coefficient from 0.4393 to 0.5256 over a raw-input baseline, demonstrating that simple texture abstraction significantly mitigates the sim-to-real gap.

Key takeaway

For AI Scientists developing vision systems for semiconductor metrology, if your models struggle with real-world Scanning Electron Microscope (SEM) image noise after synthetic data training, you should implement input binarization. This strategy, shown to improve Dice coefficient from 0.4393 to 0.5256 on the MIIC dataset, allows your Vision-Language Models to focus on critical geometric structures, directly enhancing nanometer-scale accuracy and reducing the sim-to-real domain gap.

Key insights

Input binarization effectively bridges the sim-to-real gap for VLMs in semiconductor visual program synthesis by abstracting texture.

Principles

Method

A Vision-Language Model (VLM) converts inspection images into Domain-Specific Language (DSL) code. Input binarization strips SEM-specific texture and noise before VLM processing.

In practice

Topics

Best for: AI Scientist, Research Scientist, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.