From Consensus to Split Decisions: ABC-Stratified Sentiment in Holocaust Oral Histories
Summary
A diagnostic study analyzed the performance of three pretrained transformer-based sentiment classifiers on a corpus of 107,305 utterances and 579,013 sentences from Holocaust oral histories. The study, conducted by Daban Q. Jaff of Erfurt University, found that off-the-shelf models exhibit substantial disagreement due to domain shift, particularly around the "Neutral" boundary. Researchers introduced an agreement-based stability taxonomy (ABC) to stratify inter-model output stability, reporting pairwise percent agreement, Cohen's κ, and Fleiss' κ. Overall inter-model agreement was low to moderate, driven primarily by boundary decisions concerning neutrality. An auxiliary T5-based emotion classifier was applied to stratified samples, revealing polarity-consistent affective profiles in high-agreement strata and more blended profiles in disagreement regions. The framework provides an operational method for characterizing how sentiment models diverge in sensitive historical narratives.
Key takeaway
For NLP Engineers working with sensitive historical narratives, recognize that off-the-shelf sentiment models will likely exhibit significant disagreement, especially regarding neutrality. You should implement diagnostic frameworks like the ABC taxonomy to map model behavior and identify areas of high consensus or conflict. This allows for more cautious interpretation and targeted downstream analysis, rather than relying on a single model's output as ground truth.
Key insights
Off-the-shelf sentiment models diverge significantly on Holocaust oral histories, mainly at the "Neutral" boundary, due to domain shift.
Principles
- Domain shift substantially challenges polarity detection in complex narratives.
- Inter-model disagreement signals domain-shift sensitivity.
- Ensembling models can triangulate heterogeneous knowledge sources.
Method
The ABC taxonomy stratifies inter-model agreement into full, partial, and maximal conflict categories. It uses majority voting for triangulation and applies κ-based diagnostics to quantify agreement, complemented by emotion profiling.
In practice
- Use the ABC taxonomy to identify high-consensus sentiment subsets.
- Flag or filter regions of high model disagreement.
- Consider emotion profiling for affective signatures in agreement strata.
Topics
- Holocaust Oral Histories
- Sentiment Analysis
- Transformer Models
- Model Disagreement
- Domain Shift
Code references
Best for: AI Scientist, Research Scientist, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.