DEFINED: A Data-Efficient Computational Framework for Fine-Grained Creativity Assessment in Debate Scenarios

2026-06-08 · Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, AI for Educational Assessment · Depth: Expert, extended

Summary

The DEFINED framework, developed by researchers at East China Normal University and Shanghai Innovation Institute, offers a data-efficient computational method for fine-grained creativity assessment in debate scenarios. Published for the KDD '26 conference, DEFINED addresses the challenge of evaluating creativity in complex, open-ended environments by employing a hierarchical eight-dimensional metric system. This system, comprising five creativity-specific and three non-creativity dimensions, is implemented using a Qwen2.5-7B-Instruct autoregressive language model with a specialized scoring head. The framework utilizes a mixed-granularity training strategy, combining 60 fine-grained expert annotations with 4,000 coarse-grained samples, and incorporates a constrained data augmentation strategy to mitigate "elite bias." DEFINED significantly outperforms prompt-based large language model evaluators and existing debate scoring methods, achieving an average Pearson Correlation Coefficient of 0.96 and an average Mean Squared Error of 43.09 in fine-grained assessment. Its code is available on GitHub.

Key takeaway

For AI Scientists or Machine Learning Engineers developing automated assessment tools, DEFINED offers a robust approach to complex, subjective evaluations. You should consider adopting a hierarchical metric system and a mixed-granularity training strategy to overcome data scarcity and improve model accuracy. This framework provides fine-grained insights into performance. It also mitigates biases common in LLM-based evaluators, enabling more reliable, scalable assessment in educational or competitive settings.

Key insights

Fine-grained creativity assessment in complex domains can be data-efficiently automated using hierarchical metrics and mixed-granularity LLM training.

Principles

Debate contexts offer ecologically valid creativity assessment.
Hierarchical metrics enable fine-grained evaluation.
Mixed-granularity training mitigates data scarcity.

Method

DEFINED uses a Qwen2.5-7B-Instruct LLM with a hierarchical scoring head, trained on mixed-granularity data (60 fine-grained, 4,000 coarse-grained) and augmented synthetic samples, to predict 8-dimensional and holistic scores.

In practice

Use 8-dimensional metrics for nuanced evaluation.
Augment elite data with synthetic samples.
Combine limited fine-grained with abundant coarse-grained data.

Topics

Creativity Assessment
Data-Efficient Learning
Automated Scoring
Debate Analysis
Large Language Models
Psychometric Modeling

Code references

tzwo/DEFINED

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.