Questionable content, possibly linked

Hyperreality As Analytical Framework

I have spent a great deal of time lately analyzing documents put out by Partnership on AI regarding Synthetic Media & Safety, and by WITNESS & the MIT Co-Creation Studio around deepfakes + satire. There’s a lot of interesting stuff in both, which I’ve written about recently already, but I wanted to share some of my own possibly unique perspective as someone who has, let’s say, been working both sides of the aisle – both as a satirist, and someone working in online safety.

Hyperreality Lens

First, I’m applying hyperreality as my lens. You.com/chat gave me a serviceable definition of hyperreality, which is mostly paraphrased from the Wikipedia article, it seems:

Hyperreality is a concept used to describe a state in which what is real and what is simulated or projected is indistinguishable. It is a state in which reality and fantasy have become so blended together that it is impossible to tell them apart. Hyperreality is often used to describe the world of digital media, virtual reality, and augmented reality, where the boundaries between what is real and what is simulated have become blurred.

Maybe it’s just me, but this feels like a useful starting point because it speaks to shades of grey (and endless blending) as being the natural state of things nowadays. It’s now a ‘feature not a bug’ of our information ecosystems. And even though Truth might still be singular, its faces now are many. We need new ways to talk about and understand it.

AI is like water

And on top of that, I think this quote from the CEO of Midjourney about the nature of AI as being like ‘water’ is very useful:

Right now, people totally misunderstand what AI is. They see it as a tiger. A tiger is dangerous. It might eat me. It’s an adversary. And there’s danger in water, too — you can drown in it — but the danger of a flowing river of water is very different to the danger of a tiger. Water is dangerous, yes, but you can also swim in it, you can make boats, you can dam it and make electricity. Water is dangerous, but it’s also a driver of civilization, and we are better off as humans who know how to live with and work with water. It’s an opportunity. It has no will, it has no spite, and yes, you can drown in it, but that doesn’t mean we should ban water.

Water is also for the most part ubiquitous (except I guess during droughts & in deserts, etc.) as AI soon will be. It will be included in or able to be plugged into everything in the coming years.

Lingua Franca

Thinking of it that way, we need a new language to talk about these phenomena which will, as Jack Clark aptly pointed out, lead to “reality collapse.” That is, we need a new lingua franca, and I suspect that we have that in the concept of hyperreality; we just need to draw it out a little into a more comprehensive analytical framework.

Dimensionality

One thing I’ve observed in other analyses of the conversations emerging around AI generated content and allied phenomena is that there is a bit of reduction happening. Possibly too much. It appears to me that most discussions usually center around a very limited set of axes to describe what’s happening:

  • Real vs. fake
  • Serious vs. satire
  • Harmful vs. responsible
  • Labeled vs. unlabeled

Certainly those form the core of the conversation for a reason; they are important. But alone they give an incomplete picture of a complex thing.

Speaking as someone who has had to do the dirty work of a lot of practical detection and enforcement around questionable content, I think what we need is what might be called in machine learning a “higher-dimensional” space to do our analysis. That is, we need more axes on our graphs, because applying low-dimensional frameworks appears to be throwing out too much important information, and risks collapsing together items which are fundamentally different and require different responses.

It’s interesting once we open up this can of worms, that a more dimensional approach actually corresponds quite closely to the so-called “latent space” which is so fundamental to machine learning. Simple definition:

Formally, a latent space is defined as an abstract multi-dimensional space that encodes a meaningful internal representation of externally observed events. Samples that are similar in the external world are positioned close to each other in the latent space.

In ML, according to my understanding (I had to ask ChatGPT a lot of ELI5 questions to get this straight): for items in a dataset, we have characteristics, each of which is a feature. Then a set of features that describes an item is the feature vector. Each feature corresponds to a dimension, which is sort of a measurement of the presence and quantity of a given feature. So I higher-dimensional space uses more dimensions (to measure features of items), and a low or lower dimensional space attempts to translate down to fewer dimensions while still remaining adequately descriptive for the task at hand.

In my mind, anyway, it seems altogether appropriate to adopt the language and concepts of machine learning to analyze phenomena which include generative AI – which is really usually just machine learning. It seems to fit more completely than applying other older models, but maybe that’s just me…

Higher-dimensional analysis of hyperreality artifacts

So, what does any of that mean? To me, it means we simply need more features, more dimensions that we are measuring for. More axes in our graph. I spent some time today trying to come up with more comprehensive characteristics of hyperreality artifacts, and maxed out at around 23 or so pairs of antonyms which we might try to map to any given item under analysis.

However, when I was trying to depict that many visually, it quickly became apparent that having that many items was quite difficult to show clearly in pictorial form. So I ended up reducing it to 12 pairs of antonyms, or basically 24 features, each of which corresponds to a dimension, which may itself have a range of values.

Here is my provisional visualization:

And the pairs or axes that I applied in the above goes like this:

  • Fiction / Non-fiction
  • Fact (Objective) / Opinion (Subjective)
  • Cohesive / Fragmentary
  • Clear / Ambiguous
  • Singular / Multiple
  • Static / Dynamic
  • Ephemeral / Persistent
  • Physical / Virtual
  • Harmful / Beneficial
  • Human-made / AI generated
  • Shared / Private
  • Serious / Comedic 

From my exercise in coming up with this list, I realize that the items included above as axes are not the end-all be-all here. It’s not meant to be comprehensive & other items may become useful for specific types of analysis. In fact, in coming up with even this list, I realized how fraught this kind of list is, and how many holes and how much wiggle room there is in it. But I wanted to come up with something that was broadly descriptive above and beyond what I’ve seen anywhere else.

Graphing Values

What’s the benefit of visualizing it like this? Well, having a chart helps us situate artifacts within the landscape of hyperreality; it lets us make maps. I wasn’t familiar with them before trying to understand how to represent high-dimensional sets visually, but there’s something called a radar graph or spider graph which is useful in this context.

I found a pretty handy site for making radar graphs here, and plugged my data labels (features) into it. Then, for each one, I invented a value between 0-4, which would correspond to the range of the dimension. Here’s how two different sets of values look, mapped to my graphic:

Now, these are just random values I entered to give a flavor of what two different theoretical artifacts might look like. I’m not really a “math” guy, per se, but it becomes clear right away once you start visualizing these with dummy values that you could start to make useful and meaningful comparisons between artifacts under analysis – provided you have a common criteria you’re applying to generate scores.

Criteria & Scoring

So the way you would generate real scores would be – first decide on your features/dimensions you want to study within your dataset. Then, come up with criteria that are observable in the data, and are as objective as possible. You should not have to guess for things like this, and if you are guessing a lot, your scores are probably not going to be especially meaningful. You want scoring to be repeatable and consistent, so that you can make accurate comparisons across diverse kinds of artifacts, and group them accordingly. A simple way to score would just be with a 0 for “none” and a 1 for “some.” Beyond that, you could have higher numbers for degrees or amount of which a given feature is observable in an artifact. So in the examples above, 1 could represent “a little” and 4 would be “a whole lot.”

Taking Action

Within an enforcement context – or any kind of active response, really (like for example, fact checking) – once you’ve got objective, measurable criteria that allow you to sort artifacts into groups, you can then assign each group a treatment, mitigation, or intervention – in other words, an action to take. This is usually done based on risk: likelihood, severity of harm, etc.


Anyway, I hope this gives some useful tools and mental models for people who are working in this space to apply in actual practice. Hopefully, it opens the conversation up significantly more than just trying to decide narrowly if something is real or fake, serious or satire, and getting stuck in the narrower outcomes those labels seem to point us towards.

Hyperreality is here to stay – we might as well make it work for us!

Previous

Intent is often not knowable

Next

What should we call this field?

Leave a Reply

Powered by WordPress & Theme by Anders Norén