Creation of the Estonian Subjectivity Dataset: Assessing the Degree of Subjectivity on a Scale

2026-06-08 · Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, extended

Summary

A new Estonian Subjectivity Dataset has been created, comprising 1,000 documents—300 journalistic articles and 700 randomly selected web texts. Each document was rated for subjectivity on a continuous scale from 0 (fully objective) to 100 (fully subjective) by four human annotators. Initial inter-annotator correlations were moderate, ranging from 0.525 to 0.675, but improved to 0.678 after re-annotating texts with highly divergent scores. An experimental automatic annotation using GPT-5 yielded scores with correlations between 0.601 and 0.801 compared to human ratings. However, GPT-5 never assigned scores above 98 and showed systematic differences, such as rating texts with direct quotes higher and giving less weight to colloquial language than human annotators. Human judgments were also found to be influenced by the sequence of presented texts.

Key takeaway

For NLP Engineers developing subjectivity analysis systems, recognize that human annotation is inherently variable and context-dependent, necessitating robust aggregation strategies. While LLMs like GPT-5 can provide plausible scores, their systematic differences in interpreting quotes or colloquial language mean you should not treat them as direct human replacements. Instead, consider LLMs as complementary tools for initial scoring or large-scale automation, but always validate against human benchmarks, especially for nuanced linguistic research.

Key insights

Human subjectivity annotation is inherently variable and influenced by context, while LLMs offer plausible but distinct scoring.

Principles

Subjectivity is best captured on a continuous scale.
Contextual factors influence human annotation consistency.
LLMs interpret subjectivity differently from humans.

Method

A dataset of 1,000 texts was annotated on a 0-100 continuous subjectivity scale by four humans, with a subset re-annotated. GPT-5 also provided scores, using a JSON schema for output.

In practice

Use continuous scales for nuanced subjective tasks.
Randomize text order to mitigate human context effects.
Compare LLM outputs to human baselines for divergence.

Topics

Estonian Subjectivity Dataset
Document-level Subjectivity
LLM Annotation
Inter-annotator Agreement
Natural Language Processing
GPT-5

Best for: AI Scientist, NLP Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.