FllumaOne: A Code-Native Multimodal CAD Dataset with Executable Programs and Kernel-Validated Feature Histories
Summary
FllumaOne is a new code-native multimodal CAD dataset designed for editable CAD research, featuring models generated by executable Python programs within the Flluma CAD system, which is based on Qt/C++ OpenCASCADE. Each dataset sample meticulously aligns its Python program with a structured feature tree, a training-oriented intermediate representation, STEP geometry, a surface point cloud, natural-language descriptions, metadata, and eight canonical visible-edge renderings. The primary release, FllumaOne-100K, comprises 100,000 validated samples across four template-level complexity regimes. Programs undergo rigorous kernel geometry, solid validity, and export checks before inclusion. A Qwen2.5-Coder-1.5B LoRA baseline, trained on 80,000 samples, demonstrated high performance on a 10,000-sample test split, achieving 99.98% Python syntax validity, 99.97% Flluma build success, and 99.14% STEP-export validity. For 9,909 predictions converted to point clouds, the mean normalized Chamfer Distance was 0.002124. This dataset supports various applications including conditioned CAD reconstruction, executable program synthesis, and editable reverse engineering.
Key takeaway
For AI Scientists and Machine Learning Engineers developing CAD generation or editing models, FllumaOne provides a uniquely validated, multimodal dataset. You can utilize its executable Python programs and detailed feature histories to train more robust models for tasks like conditioned CAD reconstruction or program synthesis. This resource significantly reduces data validation overhead, allowing you to focus on model innovation and achieve higher fidelity in your CAD applications.
Key insights
FllumaOne is a multimodal CAD dataset linking executable code to geometry and feature histories for advanced design research.
Principles
- Editable CAD research requires datasets exposing operations, parameters, and dependencies.
- Validated geometry and construction history are crucial for CAD datasets.
- Multimodal data improves CAD model understanding and synthesis.
Method
Models are generated by executable Python programs in Flluma, then validated via kernel geometry, solid validity, and export checks before inclusion in the dataset.
In practice
- Train models for conditioned CAD reconstruction.
- Develop systems for executable program synthesis.
- Explore B-Rep analysis and design completion.
Topics
- FllumaOne Dataset
- Parametric CAD
- Multimodal Datasets
- Program Synthesis
- Feature Trees
- B-Rep Analysis
Best for: Computer Vision Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.