Following on from my recent look at LLMs (large language models) as being potentially something predicted by postmodernists, I wanted to add another layer onto that.

Let’s dive right in with this original older definition of “hypertext” within the context of semiotics, via Wikipedia:

Hypertext, in semiotics, is a text which alludes to, derives from, or relates to an earlier work or hypotext. For example, James Joyce’s Ulysses could be regarded as one of the many hypertexts deriving from Homer’s Odyssey…”

It continues on with some more relevant info:

The word was defined by the French theorist GĂ©rard Genette as follows: “Hypertextuality refers to any relationship uniting a text B (which I shall call the hypertext) to an earlier text A (I shall, of course, call it the hypotext), upon which it is grafted in a manner that is not that of commentary.” So, a hypertext derives from hypotext(s) through a process which Genette calls transformation, in which text B “evokes” text A without necessarily mentioning it directly “.

Compare with the related term, intertextuality:

“Intertextuality is the shaping of a text’s meaning by another text, either through deliberate compositional strategies such as quotation, allusion, calque, plagiarism, translation, pastiche or parody, or by interconnections between similar or related works perceived by an audience or reader of the text.”

Speaking of plagiarism, I’ve used somewhat extensively a plagiarism/copyright scanning tool called Copyleaks. The tool is decent for what it is, and the basic report format that it outputs for text that it scans looks like this:

So while this tool is intended for busting people’s chops for potentially trying to pass off the work of others as their own, the window that it shows us into intertextuality and the original sense of hypertext is quite an interesting one.

We can see here specifically:

  • Passages within a text that appear elsewhere in the company’s databases, and the original source of those passages
  • Passages which appear to have been slightly modified (probably to pass plagiarism checkers like this)
  • Some other bits and bobs, but those are the major ones

I find “plagiarism” as a concept to be somewhat of a bore. But looking at this as a way to analyze and split apart texts into their component layers and references suddenly makes this whole thing seem a lot more interesting. It allows for a type of forensic “x-ray” analysis of texts, and a peek into the hidden underlying hypotexts from which it may be composed.

The whole thing calls to mind for me as well another tangential type of forensic x-ray analysis for documents, something we see in the form of a Github diff, which tracks revisions to a document.

This is not the most thrilling example of a Github diff ever, but it’s one I have on hand related to Quatria:

It’s easy enough to see here the difference in a simple file, though diffs can become quite complex as well. Both this and the original semiotic notion of hypertexts (as exposed through plagiarism checkers) seems like another useful avenue to explore in terms of how might we want to try to visualize AI attribution in a text.