Data Work Is Too Secretive. Big Tech Should be Held Accountable.

· Source: Tech Policy Press · Field: Legal & Regulatory — Compliance & Risk Management, Regulatory Affairs & Government Relations, Corporate Law & Business Legal Services · Depth: Fundamental Awareness, medium

Summary

A new investigation by the Dutch nonprofit SOMO reveals the opaque network of data work platforms supplying cheap labor to Big Tech companies like Amazon, Google, Meta, Microsoft, and Nvidia for AI training. The study identified at least 30 such platforms, including Sama and Clickworkers, used by these tech giants. Data workers, often referred to as "ghost" workers, perform microtasks like collecting selfies, recording daily routines, or photographing sensitive documents, frequently under precarious conditions and for low pay, particularly in the Global South. The article highlights instances of extreme secrecy, with workers signing NDAs and being unaware of the final clients or the purpose of their tasks, such as aiding the US military or providing data for unknown projects like "Spoofy Doo." This lack of transparency masks the extensive human labor required for AI development and raises significant concerns about labor rights, worker exploitation, and the quality of AI systems.

Key takeaway

For CTOs and VPs of Engineering overseeing AI development, this investigation underscores the critical need for supply chain transparency in data acquisition. Your teams should demand full disclosure from data work vendors regarding worker conditions, payment structures, and the ultimate use of collected data. Prioritize ethical sourcing and audit your data pipelines to mitigate reputational risks and ensure your AI systems are not built on exploitative labor practices, which can also compromise data quality.

Key insights

Big Tech's reliance on opaque data work platforms for AI training exploits "ghost" workers globally.

Principles

Method

The Centre for Research on Multinational Corporations (SOMO) mapped 30+ data work platforms used by major tech companies to expose the network supplying AI training data.

In practice

Topics

Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Ethicist, Policy Maker, Legal Professional

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Tech Policy Press.