AI can rewrite open source code—but can it rewrite the license, too?
Summary
The release of chardet version 7.0, a popular open-source Python library for character encoding detection, has ignited a debate over AI-assisted code licensing and derivative works. Maintainer Dan Blanchard rewrote the library in approximately five days using Claude Code, achieving a 48x performance boost and relicensing it from LGPL to a more permissive MIT license. This rewrite, which Blanchard claims is "structurally independent" and shows only 1.29 percent similarity to previous versions via JPlag statistics, involved starting with an empty repository and instructing Claude not to use LGPL/GPL code. However, the original author, Mark Pilgrim, argues it is an illegitimate relicensing because Blanchard had extensive exposure to the original LGPL-licensed code, and Claude's training data likely included prior chardet versions, raising questions about whether the new code is truly non-derivative.
Key takeaway
For CTOs and VPs of Engineering evaluating AI code generation tools, you must carefully consider the legal implications of AI-assisted rewrites, especially concerning open-source licenses. Your teams should establish clear protocols for "AI clean room" development, including strict separation from existing codebases and explicit instructions to the AI, to mitigate risks of unintended license violations and derivative work claims.
Key insights
AI-assisted code rewrites challenge traditional "clean room" reverse engineering and open-source licensing principles.
Principles
- Structural independence is key for non-derivative works.
- AI training data ingestion complicates derivative work claims.
Method
To achieve an "AI clean room" rewrite, specify architecture and requirements, start with an empty repository, and explicitly instruct the AI not to base code on prior licensed versions.
In practice
- Use JPlag or similar tools to assess code structural similarity.
- Document AI prompts and architectural specifications carefully.
Topics
- AI Code Generation
- Software Licensing
- Open-Source Software
- Derivative Works
- Legal Implications of AI
Code references
Best for: CTO, VP of Engineering/Data, Director of AI/ML, Software Engineer, AI Engineer, Legal Professional
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by AI - Ars Technica.