FllumaOne: A Code-Native Multimodal CAD Dataset with Executable Programs and Kernel-Validated Feature Histories

2026-06-16 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Data Science & Analytics · Depth: Expert, quick

Summary

FllumaOne is a new code-native multimodal CAD dataset designed for editable CAD research, featuring models generated by executable Python programs within the Flluma CAD system, which is based on Qt/C++ OpenCASCADE. Each dataset sample meticulously aligns its Python program with a structured feature tree, a training-oriented intermediate representation, STEP geometry, a surface point cloud, natural-language descriptions, metadata, and eight canonical visible-edge renderings. The primary release, FllumaOne-100K, comprises 100,000 validated samples across four template-level complexity regimes. Programs undergo rigorous kernel geometry, solid validity, and export checks before inclusion. A Qwen2.5-Coder-1.5B LoRA baseline, trained on 80,000 samples, demonstrated high performance on a 10,000-sample test split, achieving 99.98% Python syntax validity, 99.97% Flluma build success, and 99.14% STEP-export validity. For 9,909 predictions converted to point clouds, the mean normalized Chamfer Distance was 0.002124. This dataset supports various applications including conditioned CAD reconstruction, executable program synthesis, and editable reverse engineering.

Key takeaway

For AI Scientists and Machine Learning Engineers developing CAD generation or editing models, FllumaOne provides a uniquely validated, multimodal dataset. You can utilize its executable Python programs and detailed feature histories to train more robust models for tasks like conditioned CAD reconstruction or program synthesis. This resource significantly reduces data validation overhead, allowing you to focus on model innovation and achieve higher fidelity in your CAD applications.

Key insights

FllumaOne is a multimodal CAD dataset linking executable code to geometry and feature histories for advanced design research.

Principles

Editable CAD research requires datasets exposing operations, parameters, and dependencies.
Validated geometry and construction history are crucial for CAD datasets.
Multimodal data improves CAD model understanding and synthesis.

Method

Models are generated by executable Python programs in Flluma, then validated via kernel geometry, solid validity, and export checks before inclusion in the dataset.

In practice

Train models for conditioned CAD reconstruction.
Develop systems for executable program synthesis.
Explore B-Rep analysis and design completion.

Topics

FllumaOne Dataset
Parametric CAD
Multimodal Datasets
Program Synthesis
Feature Trees
B-Rep Analysis

Best for: Computer Vision Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.