"I've Seen How This Goes": Characterizing Diversity via Progressive Conditional Surprise
Summary
A new diversity measurement approach, "Decan" ($D_{Ca_n} = C \times a_n$), has been proposed to characterize the diversity of creative outputs. Published on 2026-06-01, this method is designed to evaluate post-training mode collapse, compare decoding strategies, and quantify creative behavior in both AI and human writing. "Decan" operates by using in-context learning to derive a per-byte score from the per-token log-probabilities of a base model $\theta$ in a single forward pass per permutation, eliminating the need for an embedding model, reference corpus, or human labels. Grounded in information theory, it detects a wide range of similarities and avoids training a special-purpose model. On the human-grounded McDiv benchmark, $D_{Ca_n}$ achieved an OCA of 0.846 on the prompt_gen set, compared to SentBERT's 0.897. Furthermore, it monotonically detected diversity loss across the OLMo-2-7B post-training pipeline (base $\to$ SFT $\to$ DPO $\to$ RLVR stages), indicating its relevance for creative-writing applications.
Key takeaway
For Machine Learning Engineers evaluating generative model diversity or comparing decoding strategies, the Decan metric offers a streamlined, model-agnostic approach. You can quantify diversity loss across your post-training pipelines (e.g., SFT, DPO, RLVR) without needing separate embedding models or human labels. This allows for efficient, information-theoretic assessment of creative output quality, helping you identify and mitigate mode collapse earlier in development.
Key insights
Diversity of creative outputs can be characterized by progressive conditional surprise using in-context learning from a base language model.
Principles
- Diversity measurement can leverage in-context learning.
- Information theory offers a robust metric foundation.
- Single-pass scoring avoids external models.
Method
The "Decan" metric ($D_{Ca_n} = C \times a_n$) computes a per-byte score from a base model's $\theta$ per-token log-probabilities via a single forward pass per permutation.
In practice
- Assess mode collapse in post-training pipelines.
- Benchmark generative AI decoding strategies.
- Quantify creativity in human and AI text.
Topics
- Diversity Measurement
- In-context Learning
- Language Models
- Mode Collapse
- Generative AI
- Information Theory
Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.