Where Does Social Reasoning Come From? Capability Provenance in Language Models

2026-06-19 · Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Natural Language Processing · Depth: Expert, extended

Summary

A study by Glenn Matlin et al. investigates the provenance of social and STEM reasoning capabilities in the OLMo3-7B language model by attributing its predictions to specific regions of the Dolma3 pretraining corpus. Using gradient-based attribution (TrackStar via Bergson) and aggregating influence across WebOrganizer's 576-bin topic-format taxonomy, the research contrasts SocialIQA and MMLU Social Sciences against ARC-Challenge and MMLU STEM. Findings indicate that social and STEM reasoning draw on qualitatively distinct corpus regions, with this distinction being sharper for reasoning tasks than for knowledge tasks. Targeted machine unlearning partially validates these associations, showing that forgetting high-attribution topic bins degrades aligned benchmarks more than random baselines. All code, sampling manifests, the bin-level influence matrix, and unlearning checkpoints are openly released.

Key takeaway

For AI Scientists and Machine Learning Engineers focused on model transparency and control, this research offers a concrete methodology. You should consider implementing taxonomic training-data attribution and targeted unlearning to audit and refine your models. This approach helps pinpoint which data regions drive specific capabilities, enabling more precise data curation, behavior modification, and safety-oriented auditing of large language models.

Key insights

Training-data attribution reveals distinct corpus regions supporting social versus STEM reasoning in LLMs.

Principles

Capability provenance can be mapped to aggregate corpus regions.
Reasoning tasks draw from broader corpus regions than knowledge tasks.
Structured taxonomic aggregation improves attribution interpretability.

Method

Apply gradient-based attribution (TrackStar/Bergson) to a stratified corpus, aggregate influence by topic-format bins (WebOrganizer), and validate findings with targeted machine unlearning.

In practice

Use bin-level attribution for data curation and auditing.
Identify specific corpus regions for model behavior modification.
Employ unlearning to causally validate attribution findings.

Topics

Training Data Attribution
Language Model Capabilities
Machine Unlearning
Social Reasoning
STEM Reasoning
OLMo3-7B
Dolma3 Corpus

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Ethicist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.