Where Does Social Reasoning Come From? Capability Provenance in Language Models
Summary
A study by Glenn Matlin et al. investigates the provenance of social and STEM reasoning capabilities in the OLMo3-7B language model by attributing its predictions to specific regions of the Dolma3 pretraining corpus. Using gradient-based attribution (TrackStar via Bergson) and aggregating influence across WebOrganizer's 576-bin topic-format taxonomy, the research contrasts SocialIQA and MMLU Social Sciences against ARC-Challenge and MMLU STEM. Findings indicate that social and STEM reasoning draw on qualitatively distinct corpus regions, with this distinction being sharper for reasoning tasks than for knowledge tasks. Targeted machine unlearning partially validates these associations, showing that forgetting high-attribution topic bins degrades aligned benchmarks more than random baselines. All code, sampling manifests, the bin-level influence matrix, and unlearning checkpoints are openly released.
Key takeaway
For AI Scientists and Machine Learning Engineers focused on model transparency and control, this research offers a concrete methodology. You should consider implementing taxonomic training-data attribution and targeted unlearning to audit and refine your models. This approach helps pinpoint which data regions drive specific capabilities, enabling more precise data curation, behavior modification, and safety-oriented auditing of large language models.
Key insights
Training-data attribution reveals distinct corpus regions supporting social versus STEM reasoning in LLMs.
Principles
- Capability provenance can be mapped to aggregate corpus regions.
- Reasoning tasks draw from broader corpus regions than knowledge tasks.
- Structured taxonomic aggregation improves attribution interpretability.
Method
Apply gradient-based attribution (TrackStar/Bergson) to a stratified corpus, aggregate influence by topic-format bins (WebOrganizer), and validate findings with targeted machine unlearning.
In practice
- Use bin-level attribution for data curation and auditing.
- Identify specific corpus regions for model behavior modification.
- Employ unlearning to causally validate attribution findings.
Topics
- Training Data Attribution
- Language Model Capabilities
- Machine Unlearning
- Social Reasoning
- STEM Reasoning
- OLMo3-7B
- Dolma3 Corpus
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Ethicist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.