Encyclopedia Britannica sues OpenAI for training on nearly 100,000 articles without permission

2026-03-16 · Source: The Decoder · Field: Legal & Regulatory — Intellectual Property & Patents, Litigation & Dispute Resolution, AI Legal & Ethics · Depth: Intermediate, quick

Summary

Encyclopedia Britannica and its subsidiary Merriam-Webster have filed a lawsuit against OpenAI in federal court in Manhattan, alleging that OpenAI used nearly 100,000 online articles, encyclopedia entries, and dictionary definitions without permission to train its AI models. The complaint, first reported by Reuters, claims that ChatGPT can produce near-verbatim copies of Britannica content, thereby diverting users from Britannica's own websites. Additionally, Britannica accuses OpenAI of trademark infringement, asserting that ChatGPT's responses create a false impression of endorsement and inaccurately cite Britannica as a source. The lawsuit seeks damages and an injunction, citing that GPT-4 has "memorized" significant portions of Britannica's copyrighted content and can reproduce them on demand. This legal action highlights a broader debate in courts regarding whether AI models "store" copyrighted works in their parameters, with differing rulings from courts in Munich and the UK High Court on similar issues.

Key takeaway

For CTOs and legal teams evaluating AI model deployment, this lawsuit underscores the critical need to scrutinize training data provenance and potential copyright infringement risks. Your organization should implement robust content filtering and attribution mechanisms to prevent the reproduction of copyrighted material and mitigate legal exposure from "memorized" data. Proactively assess your AI models' outputs for verbatim content to avoid similar litigation and reputational damage.

Key insights

AI models' ability to reproduce copyrighted content from training data is a central legal and technical challenge.

Principles

AI model weights can embed reproducible content.
Verbatim output implies unauthorized copying.

In practice

Audit AI model outputs for verbatim content.
Review training data licensing agreements.

Topics

Copyright Infringement
AI Model Training
Large Language Models
Legal Disputes
Data Memorization

Best for: CTO, VP of Engineering/Data, Director of AI/ML, Legal Professional, AI Ethicist, Tech Journalist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by The Decoder.