← HomeLogin
Can coding agents relicense open source through a “clean room” implementation of code?
~ai.agents~dev~law.licensingauthor.simon willisonopen sourcepython
simonwillison.net 4 weeks agoTildes

Summary

chardet was created by Mark Pilgrim back in 2006 and released under the LGPL. Mark retired from public internet life in 2011 and chardet’s maintenance was taken over by others, most notably Dan Blanchard who has been responsible for every release since 1.1 in July 2012.

Two days ago Dan released chardet 7.0.0 with the following note in the release notes:

Ground-up, MIT-licensed rewrite of chardet. Same package name, same public API — drop-in replacement for chardet 5.x/6.x. Just way faster and more accurate!

Yesterday Mark Pilgrim opened #327: No right to relicense this project:

[...]

There are several twists that make this case particularly hard to confidently resolve:

  • Dan has been immersed in chardet for over a decade, and has clearly been strongly influenced by the original codebase.

  • There is one example where Claude Code referenced parts of the codebase while it worked, as shown in the plan—it looked at metadata/charsets.py, a file that lists charsets and their properties expressed as a dictionary of dataclasses.

  • More complicated: Claude itself was very likely trained on chardet as part of its enormous quantity of training data—though we have no way of confirming this for sure. Can a model trained on a codebase produce a morally or legally defensible clean-room implementation?

  • As discussed in this issue from 2014 (where Dan first openly contemplated a license change) Mark Pilgrim’s original code was a manual port from C to Python of Mozilla’s MPL-licensed character detection library.

  • How significant is the fact that the new release of chardet used the same PyPI package name as the old one? Would a fresh release under a new name have been more defensible?