Characterising LLM-Generated Competency Questions: a Cross-Domain Empirical Study using Open and Closed Models

2026-04-20 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Ontology Engineering & Knowledge Representation · Depth: Advanced, long

Summary

A cross-domain empirical study introduces CompCQ, a multi-dimensional framework for systematically characterizing Competency Questions (CQs) generated by Large Language Models (LLMs). CQs are crucial for ontology engineering requirement elicitation but are traditionally manual and labor-intensive. This research evaluates CQs from five LLMs, including open models KimiK2-1T, Llama 3.1-8B, Llama 3.2-3B, and closed models Gemini 2.5 Pro, GPT 4.1, across five diverse domains like cultural heritage and healthcare. The study quantifies CQ properties such as readability (Flesch-Kincaid Grade Level, Dale-Chall Readability Score), structural complexity (requirement, linguistic, syntactic), and relevance to input text (LLM-rated Likert scale). It also assesses semantic diversity and overlap using Sentence-BERT embeddings. Findings indicate that domain characteristics primarily shape LLM generation behavior, with closed models offering greater stability and readability, while open models provide higher diversity.

Key takeaway

For AI Scientists and Ontology Engineers evaluating LLMs for CQ generation, you should recognize that different models exhibit distinct generation profiles influenced by domain. Closed models like Gemini 2.5 Pro and GPT 4.1 tend to produce more readable and stable CQs, while open models offer greater diversity. Therefore, combine outputs from multiple LLMs and integrate human review to ensure comprehensive and accurate coverage of requirements, rather than relying on a single model.

Key insights

CompCQ framework systematically characterizes LLM-generated Competency Questions across multiple linguistic, structural, and semantic dimensions.

Principles

Domain characteristics drive LLM generation profiles.
No single LLM captures the full requirements space.
Closed models offer stability and readability.

Method

The CompCQ framework quantifies CQ readability, complexity (requirement, linguistic, syntactic), and LLM-rated relevance. It uses Sentence-BERT embeddings for semantic diversity (APS, ACD, Shannon entropy) and pairwise set comparisons (centroid similarity, coverage, novelty).

In practice

Combine multiple LLMs for comprehensive CQ generation.
Retain human-in-the-loop refinement for accuracy.
Use CompCQ to benchmark LLM output profiles.

Topics

Ontology Engineering
Competency Questions
Large Language Models
CompCQ Framework
Cross-Domain Empirical Study

Code references

KE-UniLiv/compcq

Best for: AI Scientist, Research Scientist, Director of AI/ML

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.