CREATE: Testing LLMs for Associative Creativity

· Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Advanced, quick

Summary

CREATE is a new benchmark designed to evaluate the creative associative reasoning capabilities of large language models. This benchmark requires models to generate multiple distinct and meaningful paths connecting different concepts within their parametric knowledge. Models are scored based on the specificity and diversity of these generated paths, rewarding a larger set of strong, varied connections. The task mirrors real-world creative challenges like hypothesis generation, featuring a vast search space while allowing for objective grading. Initial evaluations of frontier models indicate that while stronger models achieve higher creative utility, benchmark saturation remains challenging due to the task's complexity and answer multiplicity. Interestingly, "thinking models" do not consistently outperform others, even with increased token budgets, and current creative prompting techniques offer only marginal improvements.

Key takeaway

For research scientists developing or evaluating large language models, you should consider integrating CREATE into your assessment pipeline to rigorously test associative reasoning. This benchmark offers an objective framework for measuring creative utility, highlighting areas where current models, including "thinking models," still struggle. Focusing on methods that enhance path specificity and diversity will be crucial for advancing model creativity.

Key insights

CREATE evaluates models' creative associative reasoning by generating diverse, specific concept paths.

Principles

Method

Models generate multiple distinct paths connecting concepts from their parametric knowledge, scored on specificity and diversity, with higher scores for more strong, diverse paths.

In practice

Topics

Best for: Research Scientist, AI Researcher, AI Scientist, Prompt Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.