More Context, Larger Models, or Moral Knowledge? A Systematic Study of Schwartz Value Detection in Political Texts

2026-06-12 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing, Emerging Technologies & Innovation · Depth: Expert, extended

Summary

This study systematically investigates Schwartz value detection in political texts, comparing the impact of context, retrieved moral knowledge, and model scale. Researchers found that full-document context improved supervised DeBERTa-v3 encoders by 3.8–4.8 macro-F1 points over sentence-only input, but did not consistently benefit zero-shot LLMs ranging from 12B to 123B parameters. Retrieved moral knowledge, from a 58-chunk curated knowledge base, consistently improved performance across all tested model families and context conditions by 0.014 to 0.036 macro-F1 using early fusion. However, scaling from DeBERTa-v3-base to large, or from 12B to larger LLMs, did not guarantee performance gains. The strongest system was DeBERTa-v3-base with document early-RAG, achieving .314 macro-F1, indicating task supervision's importance over parameter count in this protocol.

Key takeaway

For Machine Learning Engineers developing value-sensitive NLP systems, you should prioritize supervised encoders like DeBERTa-v3-base, carefully selecting context length (e.g., full-document for encoders). Integrate simple early-fusion retrieval-augmented generation (RAG) with a curated moral knowledge base, especially for ambiguous label boundaries. This approach offers better performance and interpretability than larger zero-shot LLMs, while also being more cost-effective and reproducible. Always evaluate performance per value, not just aggregate macro-F1.

Key insights

Value detection performance depends critically on context, external knowledge, and model architecture, not just scale.

Principles

More context is not always better for value detection.
Explicit moral knowledge consistently aids fine-grained value detection.
Model scale does not guarantee performance improvements.

Method

The study compared sentence, window, and full-document inputs, no-RAG vs. retrieval-augmented settings with a 58-chunk moral knowledge base, supervised DeBERTa-v3-base/large encoders, and zero-shot LLMs (12B-123B parameters) using early, late, and cross-attention fusion.

In practice

Use early fusion for retrieval-augmented value detection.
Prioritize task-specific supervision over raw LLM parameter count.
Analyze per-value performance, not just aggregate metrics.

Topics

Schwartz Values
Value Detection
Political Text Analysis
Retrieval-Augmented Generation
DeBERTa-v3
Large Language Models

Code references

VictorMYeste/human-value-detection-context-rag

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.