Application of integrated gradients explainability to sociopsychological semantic markers

2026-06-18 · Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Social Sciences & Behavioral Studies · Depth: Expert, extended

Summary

This paper applies the integrated gradient (IG) method to analyze sociopsychological semantic markers in textual data at the word level, enhancing explainability beyond sentence-level classification. Focusing on "agency" using the BERTAgent classifier (based on RoBERTa), the research optimizes IG parameters, identifying the "zero" baseline and N=300 steps as most reliable. It demonstrates IG's effectiveness in scenarios with large labeled datasets and, uniquely, proposes an overfitting-encouraging training procedure for small datasets to identify salient class-differentiating words. The method's plausibility is validated by comparing IG outputs with expert-highlighted text portions, showing strong agreement with agentic content. This approach offers detailed insights into complex phenomena like sexual objectification.

Key takeaway

For NLP engineers and research scientists developing explainable AI for sociopsychological text analysis, you should prioritize Integrated Gradients with a "zero" baseline and N=300 steps for robust word-level attribution. When working with limited labeled data, consider intentionally overfitting your RoBERTa-based classifier to 100% accuracy to effectively leverage IG for identifying salient, class-differentiating keywords, thereby enriching domain-specific dictionaries and understanding complex semantic markers.

Key insights

Integrated gradients reveal word-level contributions to sociopsychological text classification, even with limited data.

Principles

IG with a "zero" baseline and N=300 steps is optimal.
Overfitting can enhance class distinctiveness for IG saliency.
Word-level explainability validates expert human judgment.

Method

Train a RoBERTa-based NLP model to 100% accuracy by encouraging overfitting on small labeled datasets. Then, apply Integrated Gradients to identify class-differentiating keywords.

In practice

Use IG to identify key terms for "agency" in social discourse.
Apply the overfitting strategy to build new domain-specific dictionaries.
Analyze word saliency for complex markers like sexual objectification.

Topics

Integrated Gradients
Explainable AI
Sociopsychological Markers
Agency Detection
BERTAgent
Overfitting Strategy

Code references

ali-abbi/agency-ig

Best for: AI Scientist, NLP Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.