Human Label Variation as Stable Signal: Learning Annotator-Specific Explanation Behavior via Cross-Annotator Preference Optimization

2026-05-27 · Source: Takara TLDR - Daily AI Papers · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Expert, quick

Summary

The study "Human Label Variation as Stable Signal" investigates whether large language models (LLMs) can learn and replicate annotator-specific reasoning and preferences from free-text explanations. Researchers analyzed human label variation (HLV) across two sentence-pair tasks—natural language inference and paraphrase judgment—each involving four annotators. They observed that individual annotator patterns, while weak at the single-annotation level due to strong input-content effects, become discernible after input-content reduction and aggregation. The work compared prompting and supervised fine-tuning (SFT) baselines, then introduced Cross-Annotator Preference Optimization (CAPO). Experiments demonstrated that prompting is unstable, SFT improves behavior capture, and CAPO further enhances aggregation-aware imitation and judge-based attribution, maintaining target-specific reasoning patterns. This suggests HLV can be a stable signal for scalable, explanation-based annotation.

Key takeaway

For machine learning engineers developing explanation-based annotation systems, this research indicates that utilizing annotator-specific reasoning patterns can significantly improve model fidelity. Your teams should consider moving beyond simple label agreement by incorporating methods like Cross-Annotator Preference Optimization (CAPO) to train LLMs on individual annotator histories. This approach offers a path to more scalable and nuanced data labeling, potentially reducing annotation costs and improving explanation quality.

Key insights

Large language models can learn and reproduce individual annotator explanation styles from human label variation.

Principles

HLV reveals annotator reasoning.
Annotator patterns stabilize with aggregation.
Preference optimization enhances imitation.

Method

Cross-Annotator Preference Optimization (CAPO) contrasts a target annotator's response with other valid but less target-specific annotations for the same input to learn individual explanation behavior.

In practice

Use SFT for initial behavior capture.
Apply CAPO for refined imitation.
Ground annotations in annotator histories.

Topics

Human Label Variation
Large Language Models
Cross-Annotator Preference Optimization
Supervised Fine-Tuning
Natural Language Inference
Explanation-based Annotation

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.