Porn, dog poo and social media snaps: the ‘taskers’ scraping the internet for AI firm part-owned by Meta

2026-04-07 · Source: AI (artificial intelligence) | The Guardian · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy, Human Resources & Workforce Development · Depth: Fundamental Awareness, medium

Summary

Scale AI, a company 49%-controlled by Meta, utilizes its Outlier platform to recruit tens of thousands of gig workers, including experts in medicine, physics, and economics, to refine AI systems. These "taskers" report being paid to scrape personal data from Instagram and Facebook accounts, harvest copyrighted artwork, and transcribe pornographic soundtracks, tasks they describe as morally uncomfortable and divergent from high-level AI refinement. Workers expressed concerns about collecting data from users, including minors, without their explicit understanding, and contributing to their own job displacement. The Guardian's investigation, based on interviews with 10 Outlier contractors, revealed instances of monitoring via Hubstaff and allegations of "bait-and-switch" pay tactics. Scale AI, which contracts with the Pentagon and major tech companies, stated that Outlier offers flexible work with transparent pay, and that inappropriate content is addressed, though it confirmed using children's public social media data.

Key takeaway

For CTOs and VPs of Engineering evaluating AI model development, you should scrutinize the data sourcing and labeling practices of third-party vendors like Scale AI. Understand the ethical implications and potential legal risks associated with data scraped from social media, copyrighted works, and sensitive content. Prioritize vendors with transparent, auditable data governance policies to mitigate reputational damage and ensure compliance with evolving data privacy regulations, especially concerning user consent and minor data.

Key insights

AI training relies on a vast gig workforce performing ethically questionable data collection from public and private sources.

Principles

Data collection for AI training often blurs ethical boundaries.
Gig workers face precarious conditions in the AI economy.

Method

AI models are refined by human "taskers" who label, transcribe, and scrape diverse data, including social media profiles and copyrighted content, often under monitoring.

In practice

Public social media data is actively used for AI training.
AI training involves tasks like transcribing sensitive audio.

Topics

Scale AI
Outlier Platform
AI Data Labeling
Gig Work Ethics
Social Media Scraping

Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Ethicist, Policy Maker, General Interest

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by AI (artificial intelligence) | The Guardian.