Homogeneity in AI art

Was reading this piece earlier by Haley Nahman about blandness and sameness in Netflix’ visual production quality. It raises a lot of interesting points about over-reliance on tools and techniques – and the deadening that can happen to art forms when they’re driven more by speed, efficiency, and profitability than being necessarily “good.”

Generative AI is going to have the effect of blasting this problem into the stratosphere. And we haven’t yet seen AI visual production tools hit these kinds of mass markets. It’s still largely tinkerers and weirdos with GPUs in their basements creating things.

But the things this diverse band of weirdos tends to create are disappointingly homogenous. Midjourney, in my opinion, is the worst for this. While the images have a tendency to be very well done and often beautiful, I always look at them and think they look “Midjourney.”

Stable Diffusion isn’t too far behind either though. If you go to a site like PlaygroundAI.com’s homepage at any given moment, how many of the featured images are “sexy ladies” that basically all look the same? At this moment, in the first 15, I would say 11 of them fall into that category. That’s pretty much the norm.

If we’re seeing this massive democratizing effect because of generative AI, and all these millions or billions of imaginations are suddenly being unleashed, why is it that we all just end up making totally bland T&A shots?

I think there’s at least (probably more) two parts to it: the tools are predisposed to certain things, and bland mid-distance busts and portraits are one of its strengths & hand-in-hand with that, the users are predisposed to certain things.

My hunch is also that there is a shift with generative AI where being a “creator” is only as important as being able to create the thing you want to consume. The act of creation with these tools is one and the same as consuming it.

Have definitely felt that slightly magical effect a few times using verb.ai in particular, where writing with it truly becomes collaborative, and the storytelling unfolds the way it does because I am the first audience. My invocation causes it to take the shape that it does for me. Yours is different. (Or should be, if our tools don’t force us into homogeneity…)

The process of writing with AI-assisted tools becomes one of assembly, and unfolding. There is a premise, or there is an intention, or there is an improvisation. Invocations. Call & response. Which parts of the conversation make the final cut? Can there ever truly be a final cut?

I digress, but want to return to the intent of attempting to burst the bubble of sameness… If the latent space is nearly infinite, why are we all clustering in this one small corner of it? What else is out there to explore and be uncovered in those wild territories?

A friend said something to the effect of seeing other people’s AI prompt results is a little like hearing other people tell you about their dreams. There may be elements that are interesting or resonate on occasion, but in a lot of cases, there’s kind of a “huh, weird” response. And, that’s about it. Cause what can you do… It’s someone else’s dream, and the pieces don’t fit for the hearer the way they do to the dreamer.

So adapting that into AI-storytelling, well, your results (and mileage) may vary. The insane awesome results you personally get in an AI text or image generator that seem exciting enough to you to share with friends or on social media, may have that sort of /shrug effect on other people. There’s something highly personalized about it, probably about the process and context of inquiry which surrounds it. It’s hard to translate that effect to secondary audiences after oneself, without adding some other layer(s) of meaning and context.

It’s part of what I don’t like about Midjourney: that it’s experience as an artist becomes tied up with the UX of Discord as a product. The experience of viewing generative AI images on PlaygroundAI or on Reddit is also flattening. It’s an experience of you as a user on a platform, having your imagination constrained to fit the contours and invisible social guardrails and incentives that drive our behaviors in those environments. It’s art for likes and upvotes, and accepting those as proxy replacements and measures of actual goodness and meaning.

That is real cause of the crushing sameness. But the sameness that is utterly alienating, instead of reassuring. The cruel embrace of the technological corners we have painted ourselves into. All of it illusions. Because now, all things are possible. All planets, all dimensions, all times can be envisioned & visited. Latent space is infinite. Live a little.

Questionable content, possibly linked