The complaint is trying to turn a messy cultural argument (“training vs theft”) into a narrower systems argument: “you weren’t allowed to take the files, and you had to bypass controls to do it.”

2025-11-28 · Source: Pascal’s Substack · Field: Legal & Regulatory — Intellectual Property & Patents, Compliance & Risk Management · Depth: Intermediate, medium

Summary

NVIDIA faces a third class-action lawsuit alleging it illegally harvested YouTube videos to train its foundational video model, "Cosmos." The lawsuit redefines the grievance from simple copying to "breaking through access controls" by bypassing YouTube's technological protection measures (TPMs) to obtain file-level copies. Plaintiffs claim NVIDIA used a sophisticated "download-and-ingest" pipeline involving 20-30 AWS virtual machines, IP rotation, and tools like yt-dlp to download videos from research datasets like HD-VG-130M, HDVILA-100M, and HowTo100M, which are described as mere pointers (URLs/IDs) rather than actual video files. This legal strategy focuses on the DMCA's anti-circumvention rule (17 U.S.C. § 1201(a)), aiming to establish liability without needing to prove traditional copyright infringement, fair use, or registration.

Key takeaway

For CTOs and legal teams developing AI models, this lawsuit signals a critical shift in legal strategy from copyright infringement to DMCA anti-circumvention. You should re-evaluate your data acquisition pipelines, especially for publicly streamable content, to ensure they do not bypass platform-specific technical protection measures. Proactively audit your training data provenance and consider explicit licensing agreements to mitigate the risk of litigation centered on unauthorized access and circumvention.

Key insights

The lawsuit against NVIDIA redefines AI training data acquisition as DMCA circumvention, not just copyright infringement.

Principles

Streaming is not file-level access.
Datasets as pointers require active downloading.
TPM circumvention is a distinct legal claim.

Method

NVIDIA allegedly used 20-30 AWS VMs, IP rotation, and yt-dlp to download YouTube videos at scale, bypassing platform controls to acquire file-level copies for model training.

In practice

Implement robust access controls for content.
Scrutinize research dataset origins.
Document all data acquisition processes.

Topics

NVIDIA Lawsuit
DMCA Anti-Circumvention
AI Model Training Data
YouTube Content Extraction
Copyright Litigation

Best for: CTO, VP of Engineering/Data, Director of AI/ML, Legal Professional, AI Ethicist, AI Product Manager

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Pascal’s Substack.