MUSE: Benchmarking Manufacturable, Functional, and Assemblable Text-to-CAD Generation

2026-06-06 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, long

Summary

MUSE is a new Text-to-CAD benchmark addressing the gap in evaluating complex, editable boundary representation (B-Rep) assemblies for industrial product design. Developed by The Hong Kong Polytechnic University and Curvature Flow Co., Limited, MUSE pairs 106 practical design instances with structured Design Specifications. It introduces a three-stage evaluation protocol: code check, geometric check (watertightness, self-intersection, non-manifold, overlap-free), and design-intent alignment, which assesses functionality, manufacturability, and assemblability using design-specific rubrics. A rubric-based visual language model (VLM) judge, validated by human annotation, enables scalable evaluation. Experiments with closed-source and open-source LLMs reveal a clear failure cascade, showing even strong models achieve limited success on fine-grained engineering criteria, highlighting the need to advance Text-to-CAD beyond geometric generation.

Key takeaway

For AI engineers developing Text-to-CAD models, you must prioritize engineering-ready design over mere geometric similarity. Your models should aim to satisfy functionality, manufacturability, and assemblability criteria, not just visual resemblance. Utilize structured Design Specifications and multi-stage evaluation protocols like MUSE's to identify and address the failure cascade from code to valid geometry and, ultimately, to practical engineering designs. This approach will guide your development towards truly usable industrial CAD generation.

Key insights

Text-to-CAD requires evaluation beyond geometric similarity, focusing on functionality, manufacturability, and assemblability for industrial design.

Principles

Text-to-CAD models exhibit a failure cascade from code executability to engineering-ready design.
Design evaluation must move beyond shape matching to capture practical design quality.
Structured Design Specifications are crucial for defining and assessing complex CAD instances.

Method

MUSE evaluates Text-to-CAD through a three-stage protocol: code execution, geometric validity checks (watertightness, manifold, self-intersection, overlap-free), and design-intent alignment using a rubric-based VLM judge.

In practice

Define CAD design goals using structured Design Specifications (S=<D,G,Ω,M>).
Generate engineering views (Top, Front, Right, Isometric) for robust visual assessment of CAD models.

Topics

Text-to-CAD
CAD Benchmarking
Design Specifications
Geometric Validity
Manufacturability
VLM Evaluation

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.