Quantifying and Predicting Disagreement in Graded Human Ratings

2026-03-25 · Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Natural Language Processing · Depth: Expert, long

Summary

Leixin Zhang and Çağrı Çöltekin from the University of Tübingen investigated patterns of annotation variation in graded human ratings for inappropriate language, including offensive language, hate speech, and toxic language perception. Their work explores whether the degree of annotation disagreement can be predicted from textual features. They introduce the "Opposition Index," a new metric designed to quantify perspective opposition among annotators for a given item, and examine the predictability of instances with potentially opposing human opinions. The study found a moderate positive correlation between estimated and observed annotation variance. Two approaches, direct variance prediction and estimation from predicted annotation distributions, achieved comparable performance in variance prediction. Items with high opposition index values were more challenging to predict and were often underestimated by the models.

Key takeaway

For research scientists developing NLP models for subjective tasks like hate speech detection, you should consider integrating methods for predicting annotation variance and identifying opposing opinions. This approach allows you to optimize annotation workflows by prioritizing ambiguous or controversial items for more intensive review, potentially improving model robustness and fairness in handling diverse human judgments. Your models will likely underestimate highly polarized content, so plan for additional scrutiny in those cases.

Key insights

Annotation disagreement in graded human ratings can be predicted from textual features, especially for inappropriate language.

Principles

Disagreement is inherent in many annotation tasks.
Not all items elicit the same degree of opinion divergence.
Likert scales capture fine-grained human perception differences.

Method

The study uses Likert-scale ratings and loss functions for ordinal data (Earth Mover's Distance, cumulative cross-entropy) to infer full rating distributions, and proposes an Opposition Index to quantify opposing stances.

In practice

Identify disagreement-prone items to optimize annotation workflows.
Flag conflicting cases for expert review.
Improve decision fairness by identifying disagreement.

Topics

Rating Disagreement Prediction
Annotation Variance
Opposition Index
Likert Scale Ratings
Inappropriate Language Detection

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.