Kleiner v. Adobe is another step in a pattern: the legal system is increasingly treating “training data governance” as a compliance domain, not a research footnote.

2025-11-28 · Source: Pascal’s Substack · Field: Legal & Regulatory — Intellectual Property & Patents, Compliance & Risk Management, Litigation & Dispute Resolution · Depth: Advanced, medium

Summary

The "Kleiner v. Adobe SlimLM" lawsuit, filed February 9, 2026, in the Northern District of California, is a proposed class action alleging Adobe Inc. trained its SlimLM small language models on large-scale, unlicensed copies of copyrighted books, including author Arthur Kleiner's registered work. The complaint asserts a "dataset supply chain" infringement theory, claiming SlimLM was trained on SlimPajama-627B, a dataset derived from RedPajama, which allegedly incorporated "Books3"—a corpus associated with pirated books from shadow libraries. Kleiner's suit alleges direct copyright infringement under 17 U.S.C. § 501, seeking damages, attorneys' fees, injunctive relief, and destruction of infringing copies under 17 U.S.C. § 503(b). The case also highlights Adobe's "ethical AI" marketing against its alleged reliance on known unlicensed datasets.

Key takeaway

For CTOs and VPs of Engineering integrating language models, this lawsuit signals that you cannot outsource legal risk to open dataset supply chains. Your teams must implement robust data governance and provenance tracking for all training data, especially for commercialized models. Proactively verifying the licensing and origin of datasets like Books3 is crucial to mitigate significant copyright infringement liability and reputational damage.

Key insights

AI training data governance is evolving into a critical legal compliance domain, not merely a research consideration.

Principles

Dataset supply chain liability is emerging.
Fair use defense is fact-intensive.
Piracy-tainted sources increase litigation risk.

In practice

Audit dataset provenance rigorously.
Document internal data governance.
Filter out known pirated content.

Topics

AI Training Data
Copyright Infringement
Fair Use
Data Governance
Small Language Models

Best for: CTO, VP of Engineering/Data, Executive, Legal Professional, AI Ethicist, Director of AI/ML

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Pascal’s Substack.