Where Does Social Reasoning Come From? Capability Provenance in Language Models

2026-06-17 · Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

A study investigates the provenance of social versus STEM reasoning capabilities in the OLMo3-7B language model using training-data attribution. Researchers applied gradient-based attribution (TrackStar via Bergson) to a working set from the de-duplicated Dolma3 mix, aggregating influence across WebOrganizer's 24-format x 24-topic taxonomy (576 bins). By contrasting SocialIQA and MMLU Social Sciences benchmarks with ARC-Challenge and MMLU STEM, the analysis revealed that social and STEM reasoning capabilities originate from qualitatively distinct corpus regions. This distinction is more pronounced at the reasoning level than for factual knowledge. Partial causal validation was achieved through targeted machine unlearning, demonstrating that removing high-attribution topic bins, such as Literature for SocialIQA, degrades the aligned benchmark more significantly than random baselines. All code, sampling manifests, the bin-level influence matrix, and unlearning checkpoints are open-sourced.

Key takeaway

For AI Scientists and Machine Learning Engineers developing or fine-tuning large language models, understanding capability provenance is crucial. This research indicates that social and STEM reasoning capabilities stem from distinct training data regions, suggesting that you should consider the specific data sources when aiming to enhance or mitigate particular reasoning types. You can use attribution methods and targeted unlearning to diagnose and refine model behavior, ensuring more predictable and controllable outcomes for specialized applications.

Key insights

Training data attribution reveals distinct corpus origins for social versus STEM reasoning capabilities in large language models.

Principles

Training data attribution can pinpoint capability origins.
Reasoning capabilities show sharper data distinctions than knowledge.
Targeted unlearning offers causal validation for attribution.

Method

Gradient-based attribution (TrackStar via Bergson) on Dolma3, aggregated via WebOrganizer's 576-bin taxonomy, contrasting benchmark pairs for domain (social/STEM) and capability (reasoning/knowledge).

In practice

Employ TrackStar for training data influence analysis.
Map corpus regions to specific model capabilities.
Use targeted unlearning to validate data-capability links.

Topics

Language Models
Training Data Attribution
Social Reasoning
STEM Reasoning
Machine Unlearning
OLMo3-7B

Best for: Research Scientist, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.