From Vulnerable Data Subjects to Vulnerabilizing Data Practices: Navigating the Protection Paradox in AI-Based Analyses of Platformized Lives
Summary
This paper introduces a conceptual shift from viewing vulnerability as an inherent trait of data subjects to understanding it as actively created by data practices, particularly within AI-based analyses of platformized lives. The authors develop a "protection paradox" through an AI for Social Good (AI4SG) case study: a journalist's request to use computer vision to quantify child presence in monetized YouTube "family vlogs" for regulatory advocacy. This case reveals how data-driven efforts to protect vulnerable subjects can inadvertently impose new forms of computational exposure, reductionism, and extraction. The paper proposes a reflexive ethics protocol, organized around four critical junctures—dataset design, operationalization, inference, and dissemination—to guide researchers in navigating ethical tensions and preventing well-intentioned work from leading to renewed extraction or exposure. It argues for a reflexive practice that treats data research as "world-making" work, aligning with European data protection governance.
Key takeaway
For research scientists developing AI4SG projects involving platformized data, you should critically assess how your technical decisions at each pipeline stage (dataset design, operationalization, inference, dissemination) might inadvertently precarize vulnerability. Prioritize data minimization, local processing, and nuanced reporting to avoid creating new forms of exposure or extraction, ensuring your protective intentions do not lead to unintended harm.
Key insights
Data practices can actively create and amplify vulnerability, even in AI4SG initiatives aimed at protection.
Principles
- Vulnerability is an effect of power, infrastructures, and incentives.
- Data research is "world-making" work, not passive observation.
- Invisibility can be a form of protection.
Method
A reflexive ethics protocol guides AI researchers through dataset design, operationalization, inference, and dissemination, identifying vulnerabilizing factors like exposure and monetization.
In practice
- Use local models on local servers for sensitive data inference.
- Employ qualifying language for model outputs, acknowledging limitations.
- Avoid including screenshots of platform content in publications.
Topics
- Protection Paradox
- AI for Social Good (AI4SG)
- Vulnerabilizing Data Practices
- Reflexive Ethics Protocol
- Platformized Lives
Best for: Research Scientist, AI Scientist, Data Scientist, AI Ethicist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.