Wanted to post a brief reply to this piece on the Verge by journalist Emilia David about a new organization called Fairly Trained, which aims to be sort of a “Fair Trade for AI” if I understand it correctly, offering certifications for AI models trained entirely on licensed data.

The Verge’s headline is, I think, technically inaccurate. It states: “AI models that don’t violate copyright are getting a new certification label.”

They also say this about Fairly Trained, the would-be certifying body:

Fairly Trained claims it will not issue the certification to developers that rely on the fair use argument to train models.

I think this journalist maybe took too much at face value Fairly Trained’s claims about what Fair Use actually is. Their blog post goes a bit further than what’s stated in the Verge. Quoting from that:

[…] this certification will not be awarded to models that rely on a ‘fair use’ copyright exception or similar, which is an indicator that rights-holders haven’t given consent for their work to be used in training.

[Quoting an exec at Universal Music Group] ‘We welcome the launch of the Fairly Trained certification to help companies and creators identify responsible generative AI tools that were trained on lawfully and ethically obtained materials.’

If you have a generative AI model that doesn’t rely on scraping and fair use arguments for its training data, we’d love to hear from you…

My contention with all of this is as simple as it is currently unpopular: anything that qualifies as Fair Use does not constitute a violation of copyright.

Stanford has a decent page on Fair Use here. Excerpt:

Such uses can be done without permission from the copyright owner. In other words, fair use is a defense against a claim of copyright infringement. If your use qualifies as a fair use, then it would not be considered an infringement.

So I think we can separate this announcement by Fairly Trained and the Verge’s coverage of it out into two things:

  1. The claim that Fair Use is a violation of copyright – my understanding is that it is not, and this claim probably doesn’t hold water under scrutiny.
  2. The recognition that creators have a legitimate desire to have greater control than they do under current Fair Use laws, which seem to plainly permit these kinds of uses in AI training.

While taking issue with the first one, I support the second one fully, and agree that we need new radical ways for artists (I hate the word “creators” because it reeks of ‘creating content’ instead of – can’t we all just be artists creating more than just endless ‘content’?) to be able to contribute high quality material to fully licensed data sets where everybody knows what they are getting into, and there are clear mechanisms set up that make sure that artists themselves get directly paid, and not intermediaries like collecting societies in France seeking to change the law in their favor at the likely expense of contributing artists.

I do think there is a place for these kinds of certifications and other allied efforts, but I don’t find it very useful for their purveyors to push seemingly inaccurate legal conceptions. I don’t see who that benefits. We can say we want to change how the law is, or how it ought to be interpreted, but we should also recognize what it actually today says and how it actually has been interpreted in the past. From there, we can point ourselves towards more informed aspiration, and build the realities we want to see one Jira issue at a time…