Dissecting Subjectivity and the "Ground Truth" Illusion in Data Annotation

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, AI Ethics & Responsible AI · Depth: Advanced, quick

Summary

A systematic literature review spanning 2020-2025 across seven premier venues (ACL, AIES, CHI, CSCW, EAAMO, FAccT, NeurIPS) critiques the "ground truth" paradigm in machine learning data annotation. The review, which analyzed 346 papers from an initial 30,897 records, argues that treating human disagreement as mere noise is a positivistic fallacy. It identifies a "consensus trap" facilitated by systemic failures in positional legibility and a shift towards human-as-verifier models, particularly model-mediated annotations, which introduce anchoring bias and marginalize human voices. The analysis also highlights how geographic hegemony imposes Western norms as universal benchmarks, often enforced by precarious data workers prioritizing compliance. The authors advocate for reclaiming disagreement as a high-fidelity signal to build culturally competent models and propose a roadmap for pluralistic annotation infrastructures aimed at mapping diverse human experiences rather than seeking a singular "right" answer.

Key takeaway

For research scientists developing machine learning models, you should critically re-evaluate your reliance on a singular "ground truth" in data annotation. Recognize that human disagreement is a valuable signal for building culturally competent models, rather than a defect to be minimized. Consider designing annotation processes that explicitly capture and integrate diverse perspectives to avoid anchoring bias and Western-centric benchmarks, thereby enhancing model robustness and fairness.

Key insights

Human disagreement in data annotation is a vital sociotechnical signal, not mere noise.

Principles

Method

A systematic literature review identified 346 papers from 30,897 records via tiered keyword filtration and manual screening, followed by reflexive thematic analysis.

In practice

Topics

Best for: Research Scientist, AI Researcher, AI Scientist, AI Ethicist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.