Assessing the Business Process Modeling Competences of Large Language Models

· Source: cs.SE updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Business Process Management · Depth: Expert, extended

Summary

BEF4LLM is a novel evaluation framework for assessing Large Language Models' (LLMs) competence in generating Business Process Model and Notation (BPMN) models from natural language descriptions. This framework comprises four perspectives: syntactic quality, pragmatic quality, semantic quality, and validity. A comprehensive analysis of 17 open-source LLMs, including Llama 3, Qwen 2.5, Qwen 3, and Deepseek-R1, was conducted on 105 curated text-BPMN pairs, with 5 runs per sample. Results indicate LLMs excel in syntactic and pragmatic quality (scores consistently above 0.75 and 0.8, respectively), but human experts outperform LLMs in semantic aspects (human score 0.5152). Validity, especially generating valid BPMN-XML files, remains a major challenge for most LLMs, with Llama 3.1 70b achieving the highest validity at 97.33%. Larger LLMs do not always yield better results, sometimes degrading pragmatic quality.

Key takeaway

For AI Scientists and ML Engineers developing LLM-driven BPM tools, prioritize instruction-tuned models. Implement robust validation and refinement for BPMN-XML output. While larger models boost syntactic/semantic quality, smaller LLMs often deliver superior pragmatic results. Parameter count alone doesn't guarantee overall quality; select based on specific needs. Focus fine-tuning efforts on improving semantic accuracy and ensuring valid XML generation to enhance practical deployment.

Key insights

LLMs show strong syntactic and pragmatic BPMN generation but struggle with semantic accuracy and XML validity.

Principles

Method

BEF4LLM is a four-perspective framework (syntactic, pragmatic, semantic, validity) using 39 metrics to evaluate LLM-generated BPMN models against ground truth and human experts.

In practice

Topics

Code references

Best for: AI Scientist, Machine Learning Engineer, Director of AI/ML

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.