Modern statistics did not emerge in a neutral vacuum. Some of its foundational tools, categories and habits were shaped in a world of colonial administration, racial classification,...
Summary
Iris Clever's warning, highlighted in an NRC interview, asserts that modern statistics are not neutral, having been shaped by historical assumptions like colonial administration, racial classification, and eugenics. This non-neutrality deeply impacts AI, as models trained on such historically biased datasets can reproduce and scale old discrimination with increased opacity. The article argues that AI bias is often historical residue, not just a technical flaw, meaning mathematically elegant models can still reanimate discriminatory logic from their training archives. It emphasizes that "better data" for AI developers must include historically understood data, requiring knowledge of its origin, collection purpose, categories used, and missing groups. The EU AI Act's data governance requirements are a positive step, but the article advocates for integrating humanities and social sciences expertise to address deeper historical and social implications of data classification.
Key takeaway
For Directors of AI/ML overseeing model development, you must move beyond purely technical bias mitigation. Your teams should prioritize "data archaeology," investigating the historical context, original purpose, and collection conditions of all training datasets. This approach ensures that your AI systems do not inadvertently reanimate historical discrimination, especially in high-stakes domains. Implement robust data provenance requirements and integrate humanities expertise to build truly trustworthy and socially literate AI.
Key insights
AI trustworthiness requires matching statistical power with historical memory, institutional accountability, and democratic control.
Principles
- Statistics and data classification are not neutral.
- AI bias is often historical residue, not just technical.
- Removing sensitive categories can hide, not cure, bias.
Method
Data governance must become data archaeology, integrating humanities questions about category creation, beneficiaries, harms, and political implications.
In practice
- Demand dataset documentation and subgroup performance testing.
- Examine data origin, collection purpose, and category assumptions.
- Integrate historians, sociologists, and archivists into AI teams.
Topics
- AI Bias
- Data Provenance
- Historical Data Bias
- Responsible AI
- EU AI Act
- Humanities in AI Ethics
Best for: CTO, VP of Engineering/Data, Executive, AI Ethicist, Policy Maker, Director of AI/ML
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Pascal’s Substack.