Doe 1 v. Github/Microsoft/OpenAI: Much of the proof—prompt logs, output frequencies, memorization testing, preprocessing pipelines, “cleaning” steps—is uniquely in defendants’ possession...
Summary
A Ninth Circuit oral argument in Doe 1 v. GitHub / Microsoft / OpenAI addresses whether AI companies can be held liable under 17 U.S.C. § 1202(b) (DMCA CMI provisions) for stripping copyright management information (CMI) during model training. Plaintiffs argue that ingesting code with CMI, then generating outputs without it, constitutes "removal" even without identical copying, suggesting "substantial similarity" applies. Defendants counter that plaintiffs lack standing due to speculative harm and that "removal" requires tampering with existing CMI on a copy, not merely omitting attribution in a new output. Judges' questions focused on the plausibility of alleged harm, the distinction between CMI removal and general attribution, and the specific pleading of "input-stage stripping" versus output-only claims.
Key takeaway
For legal teams and CTOs evaluating litigation risks or developing AI governance policies, your strategy must address the Ninth Circuit's signals regarding DMCA §1202(b). You should prioritize pleading concrete, quantifiable harm and specific mechanisms of CMI removal during ingestion/training, rather than relying on broad attribution claims. Ensure your requested relief focuses on operational governance and auditability, aligning with judicial preference for clear, actionable remedies over sweeping mandates.
Key insights
The core dispute is whether DMCA CMI provisions apply to AI model training and output generation.
Principles
- Courts are wary of abstract injury claims.
- "Removal" of CMI implies tampering, not mere omission.
- Pleading intent/knowledge is critical for §1202(b) claims.
Method
Litigants suing AI makers under DMCA §1202(b) should specifically plead concrete harm, quantify plausibility, detail CMI stripping mechanisms, link removal to infringement risk, and document matching outputs with missing CMI.
In practice
- Quantify regurgitation rates with scale facts.
- Use realistic prompts in evidence.
- Differentiate direct vs. circumstantial evidence.
Topics
- DMCA
- Copyright Law
- AI Ethics
- Generative AI
- Legal Precedent
Best for: CTO, VP of Engineering/Data, Director of AI/ML, Legal Professional, AI Ethicist, Policy Maker
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Pascal’s Substack.