Where Does Social Reasoning Come From? Capability Provenance in Language Models
Summary
A study investigates the provenance of social versus STEM reasoning capabilities in the OLMo3-7B language model using training-data attribution. Researchers applied gradient-based attribution (TrackStar via Bergson) to a working set from the de-duplicated Dolma3 mix, aggregating influence across WebOrganizer's 24-format x 24-topic taxonomy (576 bins). By contrasting SocialIQA and MMLU Social Sciences benchmarks with ARC-Challenge and MMLU STEM, the analysis revealed that social and STEM reasoning capabilities originate from qualitatively distinct corpus regions. This distinction is more pronounced at the reasoning level than for factual knowledge. Partial causal validation was achieved through targeted machine unlearning, demonstrating that removing high-attribution topic bins, such as Literature for SocialIQA, degrades the aligned benchmark more significantly than random baselines. All code, sampling manifests, the bin-level influence matrix, and unlearning checkpoints are open-sourced.
Key takeaway
For AI Scientists and Machine Learning Engineers developing or fine-tuning large language models, understanding capability provenance is crucial. This research indicates that social and STEM reasoning capabilities stem from distinct training data regions, suggesting that you should consider the specific data sources when aiming to enhance or mitigate particular reasoning types. You can use attribution methods and targeted unlearning to diagnose and refine model behavior, ensuring more predictable and controllable outcomes for specialized applications.
Key insights
Training data attribution reveals distinct corpus origins for social versus STEM reasoning capabilities in large language models.
Principles
- Training data attribution can pinpoint capability origins.
- Reasoning capabilities show sharper data distinctions than knowledge.
- Targeted unlearning offers causal validation for attribution.
Method
Gradient-based attribution (TrackStar via Bergson) on Dolma3, aggregated via WebOrganizer's 576-bin taxonomy, contrasting benchmark pairs for domain (social/STEM) and capability (reasoning/knowledge).
In practice
- Employ TrackStar for training data influence analysis.
- Map corpus regions to specific model capabilities.
- Use targeted unlearning to validate data-capability links.
Topics
- Language Models
- Training Data Attribution
- Social Reasoning
- STEM Reasoning
- Machine Unlearning
- OLMo3-7B
Best for: Research Scientist, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.