Creation of the Estonian Subjectivity Dataset: Assessing the Degree of Subjectivity on a Scale
Summary
A new Estonian Subjectivity Dataset has been created, comprising 1,000 documents—300 journalistic articles and 700 randomly selected web texts. Each document was rated for subjectivity on a continuous scale from 0 (fully objective) to 100 (fully subjective) by four human annotators. Initial inter-annotator correlations were moderate, ranging from 0.525 to 0.675, but improved to 0.678 after re-annotating texts with highly divergent scores. An experimental automatic annotation using GPT-5 yielded scores with correlations between 0.601 and 0.801 compared to human ratings. However, GPT-5 never assigned scores above 98 and showed systematic differences, such as rating texts with direct quotes higher and giving less weight to colloquial language than human annotators. Human judgments were also found to be influenced by the sequence of presented texts.
Key takeaway
For NLP Engineers developing subjectivity analysis systems, recognize that human annotation is inherently variable and context-dependent, necessitating robust aggregation strategies. While LLMs like GPT-5 can provide plausible scores, their systematic differences in interpreting quotes or colloquial language mean you should not treat them as direct human replacements. Instead, consider LLMs as complementary tools for initial scoring or large-scale automation, but always validate against human benchmarks, especially for nuanced linguistic research.
Key insights
Human subjectivity annotation is inherently variable and influenced by context, while LLMs offer plausible but distinct scoring.
Principles
- Subjectivity is best captured on a continuous scale.
- Contextual factors influence human annotation consistency.
- LLMs interpret subjectivity differently from humans.
Method
A dataset of 1,000 texts was annotated on a 0-100 continuous subjectivity scale by four humans, with a subset re-annotated. GPT-5 also provided scores, using a JSON schema for output.
In practice
- Use continuous scales for nuanced subjective tasks.
- Randomize text order to mitigate human context effects.
- Compare LLM outputs to human baselines for divergence.
Topics
- Estonian Subjectivity Dataset
- Document-level Subjectivity
- LLM Annotation
- Inter-annotator Agreement
- Natural Language Processing
- GPT-5
Best for: AI Scientist, NLP Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.