OmniSch: A Multimodal PCB Schematic Benchmark For Structured Diagram Visual Reasoning

· Source: cs.CV updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Software Development & Engineering · Depth: Expert, long

Summary

OmniSch is introduced as the first comprehensive benchmark designed to evaluate large multimodal models (LMMs) on Printed Circuit Board (PCB) schematic understanding and machine-readable spatially weighted netlist graph construction. Comprising 1,854 real-world schematic diagrams, OmniSch includes 109.9K grounded instances aligning 423.4K diagram semantic labels and 219.8K net graphs. The benchmark assesses LMMs across four tasks: visual grounding for schematic entities, diagram-to-graph reasoning, geometric reasoning for layout-dependent weights, and tool-augmented agentic reasoning for visual search. Initial evaluations using models like Claude-Sonnet-4.6, Gemini-3.1-Pro, and GPT-5.2 reveal significant limitations in current LMMs, including unreliable fine-grained grounding, brittle layout-to-graph parsing, inconsistent global connectivity reasoning, and inefficient visual exploration of complex engineering artifacts.

Key takeaway

For AI Scientists and Machine Learning Engineers developing LMMs for technical diagrams, recognize that current models exhibit substantial weaknesses in PCB schematic understanding, particularly in fine-grained grounding and topological reasoning. You should prioritize developing models that can infer circuit topology from visual and structural cues, rather than relying solely on textual annotations. Integrating tool-augmented agentic workflows with explicit localization cues and interactive visual search capabilities can significantly enhance performance in schematic-to-netlist generation tasks.

Key insights

Current LMMs struggle with fine-grained visual grounding and topological reasoning for complex PCB schematics, necessitating specialized benchmarks.

Principles

Method

OmniSch uses an automated pipeline with an EDA generative rendering engine to parse EAGLE XML files, re-simulate rendering, and export pixel-aligned annotations for symbols, pins, text, and spatially weighted netlist graphs.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CV updates on arXiv.org.