MUSE: Benchmarking Manufacturable, Functional, and Assemblable Text-to-CAD Generation
Summary
MUSE is a new Text-to-CAD benchmark addressing the gap in evaluating complex, editable boundary representation (B-Rep) assemblies for industrial product design. Developed by The Hong Kong Polytechnic University and Curvature Flow Co., Limited, MUSE pairs 106 practical design instances with structured Design Specifications. It introduces a three-stage evaluation protocol: code check, geometric check (watertightness, self-intersection, non-manifold, overlap-free), and design-intent alignment, which assesses functionality, manufacturability, and assemblability using design-specific rubrics. A rubric-based visual language model (VLM) judge, validated by human annotation, enables scalable evaluation. Experiments with closed-source and open-source LLMs reveal a clear failure cascade, showing even strong models achieve limited success on fine-grained engineering criteria, highlighting the need to advance Text-to-CAD beyond geometric generation.
Key takeaway
For AI engineers developing Text-to-CAD models, you must prioritize engineering-ready design over mere geometric similarity. Your models should aim to satisfy functionality, manufacturability, and assemblability criteria, not just visual resemblance. Utilize structured Design Specifications and multi-stage evaluation protocols like MUSE's to identify and address the failure cascade from code to valid geometry and, ultimately, to practical engineering designs. This approach will guide your development towards truly usable industrial CAD generation.
Key insights
Text-to-CAD requires evaluation beyond geometric similarity, focusing on functionality, manufacturability, and assemblability for industrial design.
Principles
- Text-to-CAD models exhibit a failure cascade from code executability to engineering-ready design.
- Design evaluation must move beyond shape matching to capture practical design quality.
- Structured Design Specifications are crucial for defining and assessing complex CAD instances.
Method
MUSE evaluates Text-to-CAD through a three-stage protocol: code execution, geometric validity checks (watertightness, manifold, self-intersection, overlap-free), and design-intent alignment using a rubric-based VLM judge.
In practice
- Define CAD design goals using structured Design Specifications (S=<D,G,Ω,M>).
- Generate engineering views (Top, Front, Right, Isometric) for robust visual assessment of CAD models.
Topics
- Text-to-CAD
- CAD Benchmarking
- Design Specifications
- Geometric Validity
- Manufacturability
- VLM Evaluation
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.