LLM Predictive Scoring and Validation: Inferring Experience Ratings from Unstructured Text

2026-04-15 · Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Advanced, quick

Summary

A study utilized GPT-4.1 to predict baseball fan experience ratings on a 0-10 scale based solely on unstructured text responses from approximately 10,000 fans across five Major League Baseball teams. The AI predictions showed strong alignment with self-reported ratings, with 67% falling within one point and 36% being an exact match. Across three independent scoring runs, the predictions demonstrated near-deterministic consistency, achieving 87% exact agreement and 99.9% within one point. The predicted ratings correlated most strongly with the overall experience rating (r = 0.82). However, predictions were systematically lower by about one point, a gap attributed to a construct difference: self-reported ratings reflect an overall evaluative judgment, while predicted ratings quantify the impact of salient, emotionally intense, or unusual moments.

Key takeaway

For research scientists developing sentiment analysis models, you should consider that systematic discrepancies between predicted and self-reported scores might not be errors to eliminate. Instead, these gaps can represent valuable construct differences, such as distinguishing between overall evaluative judgment and the impact of salient moments. Incorporate this understanding to refine your models and extract richer, multi-faceted insights from unstructured text data.

Key insights

LLMs can directionally predict experience ratings from unstructured text, revealing distinct evaluative constructs.

Principles

Unoptimized prompts yield directional predictions.
Systematic gaps can indicate construct differences.

Method

GPT-4.1 predicted 0-10 experience ratings from single open-ended text responses, comparing them to actual survey scores to assess accuracy and consistency across multiple runs.

In practice

Use LLMs for initial sentiment scoring.
Analyze rating discrepancies for deeper insights.

Topics

LLM Predictive Scoring
Experience Ratings
Unstructured Text Analysis
GPT-4.1
Fan Experience

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.