Doe 1 v. Github/Microsoft/OpenAI: Much of the proof—prompt logs, output frequencies, memorization testing, preprocessing pipelines, “cleaning” steps—is uniquely in defendants’ possession...

2025-11-28 · Source: Pascal’s Substack · Field: Legal & Regulatory — Intellectual Property & Patents, Litigation & Dispute Resolution, Compliance & Risk Management · Depth: Advanced, medium

Summary

A Ninth Circuit oral argument in Doe 1 v. GitHub / Microsoft / OpenAI addresses whether AI companies can be held liable under 17 U.S.C. § 1202(b) (DMCA CMI provisions) for stripping copyright management information (CMI) during model training. Plaintiffs argue that ingesting code with CMI, then generating outputs without it, constitutes "removal" even without identical copying, suggesting "substantial similarity" applies. Defendants counter that plaintiffs lack standing due to speculative harm and that "removal" requires tampering with existing CMI on a copy, not merely omitting attribution in a new output. Judges' questions focused on the plausibility of alleged harm, the distinction between CMI removal and general attribution, and the specific pleading of "input-stage stripping" versus output-only claims.

Key takeaway

For legal teams and CTOs evaluating litigation risks or developing AI governance policies, your strategy must address the Ninth Circuit's signals regarding DMCA §1202(b). You should prioritize pleading concrete, quantifiable harm and specific mechanisms of CMI removal during ingestion/training, rather than relying on broad attribution claims. Ensure your requested relief focuses on operational governance and auditability, aligning with judicial preference for clear, actionable remedies over sweeping mandates.

Key insights

The core dispute is whether DMCA CMI provisions apply to AI model training and output generation.

Principles

Courts are wary of abstract injury claims.
"Removal" of CMI implies tampering, not mere omission.
Pleading intent/knowledge is critical for §1202(b) claims.

Method

Litigants suing AI makers under DMCA §1202(b) should specifically plead concrete harm, quantify plausibility, detail CMI stripping mechanisms, link removal to infringement risk, and document matching outputs with missing CMI.

In practice

Quantify regurgitation rates with scale facts.
Use realistic prompts in evidence.
Differentiate direct vs. circumstantial evidence.

Topics

DMCA
Copyright Law
AI Ethics
Generative AI
Legal Precedent

Best for: CTO, VP of Engineering/Data, Director of AI/ML, Legal Professional, AI Ethicist, Policy Maker

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Pascal’s Substack.