Does Topic Sentiment Cause Perceived Ideology? Comparing Human and LLM Annotations in Political News Articles

· Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Social Sciences & Behavioral Studies · Depth: Expert, extended

Summary

The article investigates whether topic sentiment causally affects perceived political ideology in news articles, comparing human expert annotations from AllSides with those from Llama-3.3-70b, baseline GPT-4o-mini, and fine-tuned GPT-4o-mini. Using Double Machine Learning (DML) and mediation analysis on an N=1,265 article dataset, the study found that human annotations showed no significant causal effects of topic sentiment on ideology at the community level. In contrast, fine-tuned GPT-4o-mini, which achieved the highest classification accuracy (F1=72.48), was the only annotator paradigm to produce significant community-level treatment effects and natural direct effects (NDEs). This suggests fine-tuning can lead to "shortcut learning," where models internalize a spurious sentiment-ideology coupling not present in human judgment, a difference invisible to standard F1-based evaluation. The findings highlight implications for using LLM annotations as "silver labels" in downstream causal analyses.

Key takeaway

For AI Scientists or Research Scientists planning to use LLMs for social science annotation tasks, you should critically evaluate models beyond standard accuracy metrics like F1 score. Your LLM's high F1 might mask "shortcut learning," where it develops spurious causal links between sentiment and ideology that human annotators do not exhibit. Implement causal analysis frameworks, such as mediation analysis, to audit your LLM's annotation process and ensure its causal fidelity aligns with human judgment, especially for downstream causal inference studies.

Key insights

Fine-tuning LLMs for ideology prediction can create spurious sentiment-ideology causal links not present in human judgment.

Principles

Method

Compare human and LLM ideology labels while holding Llama-3.3-70b-versatile sentiment annotations constant. Apply Double Machine Learning and mediation analysis to identify causal effects of topic sentiment.

In practice

Topics

Code references

Best for: NLP Engineer, AI Scientist, Research Scientist, AI Ethicist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.