Assessing the Creativity of Large Language Models: Testing, Limits, and New Frontiers
Summary
A large-scale, systematic study evaluates the effectiveness of human creativity tests for predicting the creative achievement of large language models (LLMs) across three constructs: creative writing, divergent thinking, and scientific ideation. The research finds that the Divergent Association Task (DAT) and Conditional DAT are the most effective predictors for creative writing and divergent thinking, respectively. However, test effectiveness varies significantly by construct, and no single existing test reliably predicts all constructs, particularly scientific ideation. To address this gap, the study introduces the Divergent Remote Association Test (DRAT), a novel vocabulary-space test designed to assess both convergent and divergent thinking. The DRAT is identified as the first and only creativity test for LLMs that significantly predicts scientific ideation ability, demonstrating robustness across various design choices and outperforming linear combinations of existing tests.
Key takeaway
For research scientists developing or evaluating LLMs for creative applications, you should recognize that traditional human creativity tests have limited validity for machine creativity, especially for scientific ideation. Integrate the new DRAT into your evaluation pipeline if your LLM's scientific ideation capability is a key performance indicator, as it uniquely predicts this ability by assessing both convergent and divergent thinking.
Key insights
Existing human creativity tests poorly predict LLM scientific ideation, necessitating new, integrated assessment tools.
Principles
- Test validity varies significantly by creative construct.
- No single test predicts all creative abilities well.
- Convergent and divergent thinking are both essential.
Method
The study conducts a large-scale assessment of human creativity tests on LLMs, then introduces the Divergent Remote Association Test (DRAT) to measure both convergent and divergent thinking in a single instrument.
In practice
- Use DAT for creative writing assessment.
- Use Conditional DAT for divergent thinking.
- Employ DRAT for scientific ideation evaluation.
Topics
- Large Language Models
- Creativity Assessment
- Divergent Thinking
- Scientific Ideation
- Divergent Remote Association Test
Best for: Research Scientist, AI Scientist, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.