Modern statistics did not emerge in a neutral vacuum. Some of its foundational tools, categories and habits were shaped in a world of colonial administration, racial classification,...

2025-11-28 · Source: Pascal’s Substack · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Emerging Technologies & Innovation · Depth: Intermediate, medium

Summary

Iris Clever's warning, highlighted in an NRC interview, asserts that modern statistics are not neutral, having been shaped by historical assumptions like colonial administration, racial classification, and eugenics. This non-neutrality deeply impacts AI, as models trained on such historically biased datasets can reproduce and scale old discrimination with increased opacity. The article argues that AI bias is often historical residue, not just a technical flaw, meaning mathematically elegant models can still reanimate discriminatory logic from their training archives. It emphasizes that "better data" for AI developers must include historically understood data, requiring knowledge of its origin, collection purpose, categories used, and missing groups. The EU AI Act's data governance requirements are a positive step, but the article advocates for integrating humanities and social sciences expertise to address deeper historical and social implications of data classification.

Key takeaway

For Directors of AI/ML overseeing model development, you must move beyond purely technical bias mitigation. Your teams should prioritize "data archaeology," investigating the historical context, original purpose, and collection conditions of all training datasets. This approach ensures that your AI systems do not inadvertently reanimate historical discrimination, especially in high-stakes domains. Implement robust data provenance requirements and integrate humanities expertise to build truly trustworthy and socially literate AI.

Key insights

AI trustworthiness requires matching statistical power with historical memory, institutional accountability, and democratic control.

Principles

Statistics and data classification are not neutral.
AI bias is often historical residue, not just technical.
Removing sensitive categories can hide, not cure, bias.

Method

Data governance must become data archaeology, integrating humanities questions about category creation, beneficiaries, harms, and political implications.

In practice

Demand dataset documentation and subgroup performance testing.
Examine data origin, collection purpose, and category assumptions.
Integrate historians, sociologists, and archivists into AI teams.

Topics

AI Bias
Data Provenance
Historical Data Bias
Responsible AI
EU AI Act
Humanities in AI Ethics

Best for: CTO, VP of Engineering/Data, Executive, AI Ethicist, Policy Maker, Director of AI/ML

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Pascal’s Substack.