From Consensus to Split Decisions: ABC-Stratified Sentiment in Holocaust Oral Histories

· Source: cs.CL updates on arXiv.org · Field: Science & Research — Social Sciences & Behavioral Studies, Research Methodology & Innovation · Depth: Advanced, long

Summary

A diagnostic study analyzed the performance of three pretrained transformer-based sentiment classifiers on a corpus of 107,305 utterances and 579,013 sentences from Holocaust oral histories. The study, conducted by Daban Q. Jaff of Erfurt University, found that off-the-shelf models exhibit substantial disagreement due to domain shift, particularly around the "Neutral" boundary. Researchers introduced an agreement-based stability taxonomy (ABC) to stratify inter-model output stability, reporting pairwise percent agreement, Cohen's κ, and Fleiss' κ. Overall inter-model agreement was low to moderate, driven primarily by boundary decisions concerning neutrality. An auxiliary T5-based emotion classifier was applied to stratified samples, revealing polarity-consistent affective profiles in high-agreement strata and more blended profiles in disagreement regions. The framework provides an operational method for characterizing how sentiment models diverge in sensitive historical narratives.

Key takeaway

For NLP Engineers working with sensitive historical narratives, recognize that off-the-shelf sentiment models will likely exhibit significant disagreement, especially regarding neutrality. You should implement diagnostic frameworks like the ABC taxonomy to map model behavior and identify areas of high consensus or conflict. This allows for more cautious interpretation and targeted downstream analysis, rather than relying on a single model's output as ground truth.

Key insights

Off-the-shelf sentiment models diverge significantly on Holocaust oral histories, mainly at the "Neutral" boundary, due to domain shift.

Principles

Method

The ABC taxonomy stratifies inter-model agreement into full, partial, and maximal conflict categories. It uses majority voting for triangulation and applies κ-based diagnostics to quantify agreement, complemented by emotion profiling.

In practice

Topics

Code references

Best for: AI Scientist, Research Scientist, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.