DEFINED: A Data-Efficient Computational Framework for Fine-Grained Creativity Assessment in Debate Scenarios
Summary
The DEFINED framework, developed by researchers at East China Normal University and Shanghai Innovation Institute, offers a data-efficient computational method for fine-grained creativity assessment in debate scenarios. Published for the KDD '26 conference, DEFINED addresses the challenge of evaluating creativity in complex, open-ended environments by employing a hierarchical eight-dimensional metric system. This system, comprising five creativity-specific and three non-creativity dimensions, is implemented using a Qwen2.5-7B-Instruct autoregressive language model with a specialized scoring head. The framework utilizes a mixed-granularity training strategy, combining 60 fine-grained expert annotations with 4,000 coarse-grained samples, and incorporates a constrained data augmentation strategy to mitigate "elite bias." DEFINED significantly outperforms prompt-based large language model evaluators and existing debate scoring methods, achieving an average Pearson Correlation Coefficient of 0.96 and an average Mean Squared Error of 43.09 in fine-grained assessment. Its code is available on GitHub.
Key takeaway
For AI Scientists or Machine Learning Engineers developing automated assessment tools, DEFINED offers a robust approach to complex, subjective evaluations. You should consider adopting a hierarchical metric system and a mixed-granularity training strategy to overcome data scarcity and improve model accuracy. This framework provides fine-grained insights into performance. It also mitigates biases common in LLM-based evaluators, enabling more reliable, scalable assessment in educational or competitive settings.
Key insights
Fine-grained creativity assessment in complex domains can be data-efficiently automated using hierarchical metrics and mixed-granularity LLM training.
Principles
- Debate contexts offer ecologically valid creativity assessment.
- Hierarchical metrics enable fine-grained evaluation.
- Mixed-granularity training mitigates data scarcity.
Method
DEFINED uses a Qwen2.5-7B-Instruct LLM with a hierarchical scoring head, trained on mixed-granularity data (60 fine-grained, 4,000 coarse-grained) and augmented synthetic samples, to predict 8-dimensional and holistic scores.
In practice
- Use 8-dimensional metrics for nuanced evaluation.
- Augment elite data with synthetic samples.
- Combine limited fine-grained with abundant coarse-grained data.
Topics
- Creativity Assessment
- Data-Efficient Learning
- Automated Scoring
- Debate Analysis
- Large Language Models
- Psychometric Modeling
Code references
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.