What do we want from labeling of generative AI media?

In conversations around the use of labeling in synthetic or generative media (made with help from AI), sometimes the desirability of labeling in the first place seems like a foregone conclusion. I wonder if we can’t unpack that a little though, and by doing so maybe make the bigness of it all a little more comprehensible.

I won’t go so far as to say that I think it’s “solvable,” though. I think it’s just the new baseline condition we’re living in. And you don’t solve these kinds of conditions (not without dismantling many underlying structures, anyway) — you end up having to find a way to live with them.

It might actually be helpful to use that simple frame in the development of informal conventions and more formal standards: just giving us tools to live with the deluge. Rather than framing it as a thing we’re going to “solve” once and forever, or some kind of “fight” or conflict or whatever we have to win. Both of those lenses bring us towards more rigid conclusions about what is and isn’t possible, and end up hemming in our imaginations from possibly more fluid and less fragile conceptions that might end up being more easy to apply to real life.

So the questions I want to put on the table to start with are simple:

What is the desired effect of putting labels onto generative AI content? In other words, what do we believe it gets us?
What do we want the end user to do or not do as a result of the presence or absence of these labels?
How can we validate whether labeling actually drives: A) the user behavior we want, and B) that the user behavior will then have the desired effect?

Perhaps a related question would also be appropriate to examine from the beginning: how big of an impact can we realistically expect any labeling will have on users who are just fast-scrolling through news feeds all day?

Take for example this Verify tool view for a given image, as part of Adobe’s implementation of C2PA, the Content Authenticity Initiative:

I think the tool is well-intended, and there’s a lot of information being conveyed through this visualization of metadata.

But let’s make a conservative estimate that maybe the average web user encounters about 1,000 individual media artifacts in a giving day of scrolling, swiping, and browsing.

What’s the desired user behavior here? That each time a user encounters a piece of content, they should use an inspector to look up its provenance trail and then make some kind of rational determination based on the metadata that they find?

I think that pattern might appeal to a very narrow slice of users who have specific especially occupational needs related to information verification (like journalists), but for the average web user might be an undesirable extra set of steps. So I think it’s safe to say most people won’t do that, except in special cases. Which would mean, integrating metadata directly into the feed where people consume the content (as opposed to an external inspector like depicted above), and then hoping that it has the desired effect.

To speak more to that, I think sometimes the unexamined assumption in these conversations is that merely by being exposed to metadata about provenance, that people’s rational minds will be (rightly) persuaded to make appropriate reality-based decisions. It’s a noble idea, and it probably stems from Enlightenment era ideals about the superiority of Reason, etc. But it’s very much not how most people probably make most decisions, especially not on the internet.

We’ve been endlessly taught to “like” things online, or upvote things because we like them (or block them because we don’t like them). Not because they are inherently Good, or Right, or True. But because they align with our existing beliefs, notions, affiliations, identity, and feelings.

Where then does a rationally-intended datum about origin belong in all of that? It’s less clear than it seems at first glance… though there is a school of thought that says we should always make available more contextual data wherever we can. In which provenance becomes just one of many dimensions of contextual data (perhaps not unlike the Four Corners Initiative). Looking from that broader picture into this problem space might open up new vistas of what’s possible and desirable.

Going back to question 2 above: if the end user behavior we want is actually that users slow down when consuming information (as I touched on here), and consider things more carefully, examine metadata, and make rational decisions, well, then we need to redesign social media and the modern web altogether. Putting another sticker on top of the mechanistic beast that drives us directly in the opposite direction by design, is – in my humble opinion – not going to have the desired effect.

If what we want is actually more like “slow media,” and careful considered consumption, then let’s rebuild everything underneath these labels. Let’s change the core technology that brought us here in the first place. Otherwise, there’s a decent chance we will run around chasing our tails and wondering why things aren’t working as intended.

Questionable content, possibly linked

What do we want from labeling of generative AI media?

Related writing & research

Leave a Reply

What do we want from labeling of generative AI media?

Related writing & research

Content Labeling & Magritte

Gaps in Content Credentials data

Leave a Reply