Can coding agents relicense open source through a “clean room” implementation of code?
Summary
The `chardet` Python library, originally LGPL-licensed since 2006, recently released version 7.0.0 under an MIT license, claiming a "ground-up rewrite." This move sparked a legal and ethical debate, with original author Mark Pilgrim asserting a violation of the LGPL, arguing that extensive exposure to the original code by the maintainer, Dan Blanchard, precludes a "clean room" implementation. Blanchard, who has maintained `chardet` since 2012, contends the new code is structurally independent, citing JPlag tool results showing less than 1.3% similarity to previous versions. He details a process using Claude Code, starting in an empty repository with explicit instructions to avoid LGPL/GPL-licensed code, to generate the new version. This case highlights complex questions regarding AI-assisted code rewrites, derivative works, and open-source relicensing.
Key takeaway
For engineering leaders evaluating AI-assisted code modernization or relicensing efforts, your teams must meticulously document the rewrite process, including AI prompts and isolation measures. Relying solely on AI for "clean room" implementations without strict human oversight and verifiable non-derivation metrics, such as low JPlag similarity scores, introduces significant legal and ethical risks, particularly concerning existing open-source licenses. Be prepared for potential litigation as commercial entities confront similar IP challenges.
Key insights
AI-assisted code rewrites challenge traditional "clean room" definitions and open-source licensing compliance.
Principles
- Process guarantees alone do not define non-derivative work.
- Structural independence can be demonstrated through measurement.
Method
A design document is created, then an AI agent generates code in an isolated environment, explicitly instructed to avoid specific licenses, followed by human review and iteration.
In practice
- Use plagiarism detection tools like JPlag to assess code similarity.
- Document AI-assisted rewrite processes meticulously.
- Consider new package names for re-licensed projects.
Topics
- Coding Agents
- Open-Source Licensing
- Clean Room Implementation
- AI-assisted Code Generation
- Derivative Works
Code references
Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Engineer, Software Engineer, Legal Professional
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Simon Willison's Weblog.