Here’s another research-y paper I found while egosurfing on Google Scholar, The Generative AI Pirate? The Intersection of Copyrights and Generative AI in Literary Works by Brianne M. Culliton. Some googling yields that the author of the paper appears to be a paralegal rather than a lawyer. That’s fine, because I’m obviously not a lawyer either, but I have certainly spent a great deal of time working on these topics of gen AI + copyright.
Rebutting the entire article is outside the scope of this blog post, but I will at least respond directly to the part in which I am referenced.
I’m in the section on transformative use, which begins benignly enough:
The first factor in the fair use analysis is the purpose
and character of the use, including whether the use is of a
commercial nature or is for nonprofit educational purposes. The Court views this factor as a problem of substitution and
looks to whether the new work would achieve the same
purpose as the original, effectively acting as a substitute for
the original work. The more similar a work, the more likely it is to be deemed a substitute. This factor considers the degree to which a new work transforms the purpose orcharacter from the original work. This is often referred to as “transformative use.”
Okay, that seems straightforward enough. A couple paragraphs later, we start getting into it:
There are others who have created entire businesses prompting generative AI to create a literary work, then lightly editing it and listing the work for sale within 6 to 8 hours. In these situations, the purpose and character of
the newly created work may be virtually identical to that of
the original copyrighted work.When considering substitutes and transformative
nature, the issue is similar to the purpose and character
analysis above. Many of the uses of generative AI would likely create outputs considered transformative and would not create a substitute of the original work. However, individuals using generative AI to create outputs of new literary works are different. It could be argued that those works do provide substitutes, serve the same purpose as the original work, and may or may not transform the meaning or expression sufficiently to constitute fair use.
What I find troubling here are all the unsupported, unexamined assumptions, which I’ll try to pick apart line by line below.
Culliton writes, “In these situations, the purpose and character of
the newly created work may be virtually identical to that of
the original copyrighted work.”
This seems like something of a non-sequitur to me, following after a statement about how long a work takes to produce under my methodology. I don’t see how the length of time spent is any indication of how transformative a given piece is of underlying copyrighted works. My understanding is that in copyright law, there are tests of substantial similarity between an original piece and its alleged infringing copies.
So, in my eyes, we would first need to:
- Identify the original work(s) allegedly infringed
- Identify the new work(s) which allegedly infringe the original sources
- Demonstrate the elements from the new work(s) which are substantially similar to the original work(s)
As far as I can tell, the author has made no attempt to undertake any such analysis. They’ve merely made an assertion without any evidence or support.
The truth of the matter is: analyze all you want, you will NEVER identify specific individual original works which any of my books are allegedly infringing on. LLMs and image diffusion models are trained on billions and trillions of data points. They don’t go in and clip out a paragraph here, and a line there, and cut and paste this all into a new document where you can cleanly trace all the antecedents, and perfectly identify: oh, this came from here, this from here… It’s instead a statistical mixing of functionally uncountable data points. Put more simply, it’s like taking all of human writing and putting it into a blender.
Therefore, in my opinion – based on my understanding of the current state of these technologies – it is infeasible and incorrect to say AI-assisted book y produced by myself is a non-transformative reproduction of underlying training data. Because you can’t backwards identify the training data used. And you can’t take elements from my books, and set them alongside elements in those (unidentifiable) training data elements, and say there’s any substantial similarity.
Thus, the equally unsupported argument that these completely new and completely transformative works somehow act as “substitutes” in the market for the original works included in the training data falls apart. As I went into at length, I am not surreptitiously using the names of other authors or the titles of their works to market my works.
The paper also states, as identified above:
Many of the uses of generative AI would likely create outputs considered transformative and would not create a substitute of the original work. However, individuals using generative AI to create outputs of new literary works are different.
It’s entirely unclear to me why the author first agrees that many uses of generative AI are transformative, but for some magical unstated reason, then goes onto argue that the use of generative AI for literary purpose is categorically non-transformative. Um, okay, but why?
So, anyway, that’s my analysis. Take it or leave it. I’m not a lawyer, and not even a paralegal. I’m just an artist making use of these tools every day. It’s interesting how many people have written papers which include references to my work, but never actually reached out to me to discuss it, nor do they seem to have investigated the humongous mountain of content I’ve published elaborating on all of this on my website.
¯\_(ツ)_/¯
Tim B.
i should also add that in addition to not being able to trace or identify specific sources in training data from my works, at this point well after the fact of having produced them, i’m often not entirely able to tell what passages of text were my wording vs AI, so to the outside observer, that task itself would be effectively impossible, in order to determine who contributed which parts of a given text that was human-edited
wrote a little more about this a while back here:
https://www.timboucher.ca/2022/09/disorientation-in-ai-writing/