Human Label Variation as Stable Signal: Learning Annotator-Specific Explanation Behavior via Cross-Annotator Preference Optimization

· Source: Takara TLDR - Daily AI Papers · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Expert, quick

Summary

The study "Human Label Variation as Stable Signal" investigates whether large language models (LLMs) can learn and replicate annotator-specific reasoning and preferences from free-text explanations. Researchers analyzed human label variation (HLV) across two sentence-pair tasks—natural language inference and paraphrase judgment—each involving four annotators. They observed that individual annotator patterns, while weak at the single-annotation level due to strong input-content effects, become discernible after input-content reduction and aggregation. The work compared prompting and supervised fine-tuning (SFT) baselines, then introduced Cross-Annotator Preference Optimization (CAPO). Experiments demonstrated that prompting is unstable, SFT improves behavior capture, and CAPO further enhances aggregation-aware imitation and judge-based attribution, maintaining target-specific reasoning patterns. This suggests HLV can be a stable signal for scalable, explanation-based annotation.

Key takeaway

For machine learning engineers developing explanation-based annotation systems, this research indicates that utilizing annotator-specific reasoning patterns can significantly improve model fidelity. Your teams should consider moving beyond simple label agreement by incorporating methods like Cross-Annotator Preference Optimization (CAPO) to train LLMs on individual annotator histories. This approach offers a path to more scalable and nuanced data labeling, potentially reducing annotation costs and improving explanation quality.

Key insights

Large language models can learn and reproduce individual annotator explanation styles from human label variation.

Principles

Method

Cross-Annotator Preference Optimization (CAPO) contrasts a target annotator's response with other valid but less target-specific annotations for the same input to learn individual explanation behavior.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.