Tag: ai Page 1 of 7

Using Photoshop Remove Tool to Create New Paintings

On 11 January 2025

Accidentally discovered something very interesting while messing around with the Photoshop remove tool, which on certain settings uses generative AI to replace out whatever you select with it.

In this case, was just playing around removing stuff from a photo of our living room, and saw that the tool was inventing paintings or posters that don’t exist:

That one in the corner, I zoomed in on and went through a few AI upscalers with and got this altogether vague still “detail” shot of:

It’s not on its own the greatest image ever seen, but there’s something spooky about how it looks and its ultimate provenance as an AI re-inventing my physical surroundings… almost has a ‘paranormal’ quality to me somehow. I had the idea of like, could I take these AI-imagined figments, and then do like I did with my Matisse copy and make human-done reproductions? Either as faithfully as possible to the original, or else with some enhancements by the human artist?

Here are some other examples. This photo was from about halfway through last year, and represented all the paintings I’d done lately (minus two in another room).

The funny part is, this original real photograph had a couple gaps where there was empty wall space. So I went in to those two areas (marked with arrows in image above) and used the Remove tool in Photoshop on them, and it invented these two other paintings that it thinks look natural there.

Eerily, the one on the left actually shares a lot of characteristics with a paintings which I did end up doing later on, this one, the Head of Hygiea:

Photoshop got the color scheme more or less exactly right, a good bit of the overall “vibe” but it just wasn’t able to see into the subject matter. But otherwise, I’d almost call that “prescient.”

Here’s one last one for the road, two paintings in storage in my small basement studio from over a decade ago (depicted at bottom half of image below). And alongside you can see the progressively extended painting that the Photoshop Remove tool created over a few rounds of trying.

The forms and mark-making on these AI-interpolations don’t quite ring true with my actual painted works, and they aren’t quite snazzy enough yet for me to simply want to replicate them manually. But these are just early accidents and experiments. There is much hyperspatial painting to be explored and uncovered here still.

Ask An AI Guy: Handling Criticism & Pen Names As An AI-Assisted Writer

By Tim B.

On 6 January 2025

In Other

A reader sent me the following question, and gave me permission to post it and reply publicly here.

I found your Newsweek article from 2023, and it eventually led me here. I recently finished a 131,000-word manuscript using ChatGPT as a collaborator. While it’s not at a releasable quality yet, I’ve seen how AI can enhance creativity when paired with significant input.

I’m now breaking that manuscript into an into triloigy of 80,000-85,000 words for each book My process involves using AI to fill in smaller details while I focus on the larger creative elements. The result feels unique even compared to over 300 sci-fi books I’ve read.

I see a fair amount of backlash AI-generated work has.

Given your experience, I’d love your perspective:
• How do you handle criticism or backlash toward AI-assisted writing?
• Do you recommend addressing it directly, or letting the work speak for itself?
• Is using a pen name a good strategy, or does transparency have more value in this space?

AI is allowing me to channel creativity I wouldn’t have been able to do so otherwise. I have no idea where this will take me. Your thought would mean a lot as I continue this journey.

So I should preface this by saying I’m probably a very hard-headed person, or else the torrents of shame and hatred people have tried to rain down on me for doing what I am artistically would be maybe a bit too much to handle. Second, I also spent many years working in content moderation, and handling complaints for a platform. This habituated me to dealing with “anger at scale” and seeing philosophically but also very concretely how, no matter what happens in the world, and where you’re sitting in relation to it, there is no shortage of unhappy people who will come in and shit all over it. This is just what people do, and the world we live in. Everybody’s mad because everything sucks, and I’m obviously no different.

So I guess you could say I’ve had a lot of training and conditioning in dealing with this sort of vitriol. To the point where I pretty much don’t take any of it seriously anymore, and years ago gave up being the guy whose job is to try to somehow sort out and make sense of everyone else’s anger and frustrations. It’s just not my responsibility anymore – and it once was, so I know concretely and cleanly the difference, which might be harder to sort out if you’re new to this kind of extremely strong reaction what you’re describing is likely to engender.

So, more specifically, one concrete way I handle criticism is by not posting on social media accounts. If something of mine comes up on one of those networks, I might pop in as a guest and see what I can of the comments to understand what people are saying, and what their perspectives are, but I never try to engage people on those platforms, because it’s just not worth it. Angry people always feel they are right and justified in their anger. And they are emboldened when others do the same, so you can’t win in the places they congregate, imo. That’s why instead, I just write what I write here on my blog. Sometimes I respond to things I find elsewhere, but there are no comments here, so people can’t come here and dogpile me, and I am able to think in peace. People can, of course, follow links to email me directly. But I’ve never had a single person sit down to write me a thoughtful email that was angry or even a critique. Everyone who emails is curious and interested, so I take those kinds of communications to be much more important and valuable because they are person to person, and they are not social performance designed to enrage and attract likes.

I tend to address the criticisms directly when they seem interesting or spark a new way of thinking about these issues. And I agree there are issues with these technologies, a great many of them, and have spent a huge amount of time talking about them on my blog, in panel discussions, interviews, podcasts, etc. So yes, I think – for me, anyway – addressing them directly is important and necessary, and helps frame the conversation in better directions, even if these blog posts don’t garner likes on social media.

As to pen names, I think it depends on what your personal preferences and tolerances to criticism are. Because of my prior work handling complaints, moderating content, working in privacy/data protection, I didn’t put photos or video of myself online for years and years, and was extremely protective of my identity. Because I knew what kinds of horrible things people are capable of online. But then, eventually, I had the chance to talk about what I was doing, and if you want to play ball in the media, you have to use your face and your name, pretty much. Maybe there’s a way to get press anonymously or pseudonymously, but I think it be more difficult and greatly reduced compared to what I’ve been able to do by exposing myself and my person. I think also there’s something to be said for not being cowed, not being shamed or shouted down from the raging mobs, and simply being like, yeah, this is me, this is an art experiment I’m doing, etc. I’m not forcing anybody to like it, but I’m being true to my part of the dance, following my inspiration, battle testing the tools, finding the good and bad in all of it, and just sharing and being upfront about it.

That said, using another identity as a shield can be a very very good thing psychologically, and for other practical reasons. One benefit can be that it gives you a kind of mental distance – almost a ‘plausible deniability’ – for when the angry hordes come for you (and they will if you’re upfront about what you’re doing). This helps you to see that the people are reacting to the front you’re putting up about what you’re doing, and lets you get less hung up on reacting to like, oh they’re reacting to ME!! and they hate ME!! Like, okay, sure, they might “hate you” in the way that internet people hate everything, all the time, for any or no reason. But the people who complain most about my work haven’t read any of it, haven’t really read any of my articles or blog posts in any depth, haven’t listened meaningfully to my podcast or panel appearances, and haven’t actually engaged with anything I’m saying. They’re generally just reacting to a headline, and – dare I say – repeating popular talking points they heard elsewhere about why “AI bad” and how I’m just another example of ____[thing they already hate].

So, anyway, I don’t know if this is inspiring or even helpful or not, but it’s at least a true accounting of my experience. Your mileage, as ever, may vary.

Anyone else have any questions, feel free to email me, and assuming that’s okay with you, please provide permission to use the text of your email publicly. I will redact any personally identifying information from the text of your inquiry.

Cheers!

Notes on Mysterious Plasmoids

By Tim B.

On 28 December 2024

In Other

Mysterious Plasmoids is the 124th installment in the AI Lore Books series. It is the first book I’ve done in the “Mysterious…” series in quite some time, and is another ‘ripped from the headlines’ hot take on what is happening with the drone/orb situation that is supposedly happening globally (I’ve not seen any anomalies first hand, myself). It also continues in another thematic subset of my books that relate to various aspects of UAP/UFO phenomena. This one heavily references other books in that cluster, if non-linearly.

There’s a vivid dream description of mine which fellow blogger Ran Prieur documented way back in 2005 here. In it I dreamt of a hyper-nationalist/fascist future US where police sirens played the song “America, the Beautiful,” and aliens had invaded in the United States… Excerpt below:

New York City had been divided into northern and southern zones, via a gigantic wall and forcefield. The southern half still had people living and working in it. But the northern half was completely off-limits. The official story was that aliens (space, not illegal) had taken over the northern half of the city, and the rest of the United States northward.

We knew, however, that this officially story was largely a fabrication. But that was all we knew. We had to roam about the lower half of the city, trying to find a passage to the north. And we had to do so without arousing any suspicions, which was an extremely difficult task. No one in the city would answer questions or help us.

And the police presence was total. You had to keep moving at all times. Any group of people who were stopping to talk or otherwise congregate was quickly spotted and broken up by patrolling police. […]

The police also had flying discs which they sent out after you. They were autonomous electronic devices which hovered and would track you as you ran. Once they were within range, they would fire an electric bolt at you to incapacitate you until officers arrived. The discs were called “temperplexes,” and they were all apparently controlled by larger motherships which flew higher and basically looked like UFO’s.

I actually continued that dream and spun out more variations using AI and published it in an earlier volume called, The First Days of Panic. That book, however, takes it visually in a much more fascist police drone direction (which, hell, I wouldn’t rule out just yet), whereas this book more explores the notion of plasmoids as heretofore unrecognized forms of life, which have interacted with us in myriad ways throughout history and prehistory: something more like John Keel’s ultraterrestrials. Are we living in the timeline now of that dream? Maybe?

Whatever the true nature of the “real” drones/orbs/plasmoids/UFO/UAP stuff that is or isn’t going on in our skies is, I think, a little besides the point; the point is the search itself. The point is the looking, and trying to understand all possibilities, and fit the best bets that seem to match evidence from reality itself.

Or, you know, in this case, hyperreality. Images in this one were mostly made with Ideogram and Recraft, with some dabbling in Grok’s image gen, and screegrabs from Sora videos, plus some remove tool in Adobe Lightroom. Text is majority ChatGPT with many human edits and improvements, told in alternating chapters between “first person” accounts, and quasi scholarly essays. Art preview below:

I might experiment in a subsequent volume with trying to embed animated gifs or even short videos from Sora if I can get the technology working adequately to share them. Ebooks don’t seem well-suited to that kind of thing, due to file sizes, though. So we will see what’s actually still feasible.

Quoting Dave Winer on Coding With ChatGPT

By Tim B.

On 19 December 2024

In Other

I’m leaving Slack, so I have been leaning heavily on ChatGPT to help me set up possible open source alternatives like Rocket Chat :thumbsdown: and Matrix :thumbshalfwayup:. I’m not much of a Terminal wizard, but from this experience having ChatGPT guide me through using the command line, I’ve learned a lot. One of the things I’ve learned is you basically always get stuck – eventually – down one or several blind alleys when explicitly following its instructions. And then it just runs you down them again and again (but still did better than Gemini in the one time I tried it for an intractable Matrix Synapse server settings issue).

Anyway, that’s why I’m quoting this Dave Winer bit here, cause I’m apparently not the only one:

As a programming partner, ChatGPT is encyclopedic but is not good at strategy. It will drive you down blind alleys. It’s also really irritating that it rewrites your code to conform to its standards. And it has a terrible memory. Forgets things you told it specifically not to forget. It does not keep promises.

Also, because it does not suffer from human impatience, it has no problem telling you to repeat the same 5-6 checks again and again, no matter how many times you say it didn’t work and yes you already triple checked that config file in nano. Frustration, in a way, is actually valuable. It tells you when things really aren’t working and make you question whether it’s actually valuable as a human to continue down a given path. But you can’t rely on the program itself to bring that kind of guidance to you – you have to rely on your human faculty of annoyance. Which, the more I think about it, might somehow be connected to intuition: knowing when to fold and try something else.

Still though, I would not have ever learned so much so quickly about the command line without ChatGPT backing me up. And, of course, if the programs I am trying to run were not so finicky and buggy. I finally got Matrix up and running (accessing via Element), but never did sort out the correct subdomain issue I messed around with solving forever and ever. And who the hell knows if my m.room.retention settings are going to be honored. At least it will be encrypted if it’s not deleted in a purge job eventually (though hard deletion is always best policy for stale data, imo)…

Photoshop Remove tool used on original painting & historical photos

By Tim B.

On 6 December 2024

In Other

I’ve been experimenting with the new-ish Photoshop Remove tool. And no, I didn’t completely quit Adobe like I hoped/promised, but I did deeply downgrade from the all apps pla to the Photography 20GB plan, and bought the Affinity suite of 3 apps at half price on Black Friday in order to fill in any other potential gaps of functionality as I work on other projects.

But I just wanted to share this one interesting early result where I ran the Photoshop Remove tool on a photo of a painting of mine:

For this I selected most of the original painting in the middle with the Remove tool, waited a few seconds and it spit that out, which I actually think is a very interesting piece in its own right, but also when paired like this with the original source material, to explore these sort of parallel latent versions of a given work, or even a historical moment. Reality isn’t going to be the same anymore.

Here’s a historical riff, removing the flag and pole from the famous Iwo Jima photo:

Outside comic or ironic uses though, the possibilities here get quite dark quite quickly. Consider this flung together xample from the famous Tianamen Square standoff moment – Tankman? What Tankman?

Similarly dark, but in a different direction: shooter, what shooter?

Nowadays, you don’t have to hire an army of re-touch artists to toss your old comrades turned enemies out of your pics anymore, you can do it yourself in a few clicks.

At the same time, I could see this tool and its deep hyperreality effects as being a potent tool for iconoclastic political critique in a world gone completely mad… Don’t like current political situation? Here’s a filter you can use to live in a parallel reality where that person/party/place/thing never even reaches your digital eyes or ears…

Plus, I think it would be pointless to try to regulate this. Where would you even start? To ask for that would be to invite tech companies to implement ever greater filtering and restrictions, and put more and more content decisions into the hands of AIs tied to corporate profit-engines. I mean, that’s going to happen anyway, but at least let’s not go begging for it. At least not until we’ve had a chance to kick the tires a bit before it gets nerfed.

Low Tech Drawing Robot

By Tim B.

On 18 October 2024

In Other

This one is basically wheels screwed to a platform, with markers duct-taped onto the side pointing down. Here’s a video of it in operation.

You put it on the floor and move it around somehow. Your foot, your hand, a stick, attaching to your dog’s leash? You decide.

Other people make high tech robots that can draw for them with a bunch of software and electronics.

I am experimenting with doing analog low tech human-powered robots instead. You can see some of my other experiments here. This is pretty new, so I’m trying to cover a sort of like “base layer” of low hanging fruit or what have you here, as I get up to speed on some of the more mechanical knowledge for more complex machines.

Artist As Propagandist: Exploring Parallel Realities With AI

By Tim B.

On 15 October 2024

In Other

Misinformation and art intersect to explore and navigate the confusion between reality and fiction that typifies our times in the work of net artist Tim Boucher.

In works that run the gamut from books and hand-printed samizdat zines to the use of generative AI for video, text, and image-making, Boucher’s work uses hyperreality to delve into the murky shadows of the Uncanny Valley, evoking a weird, sometimes disorienting feeling of surfing the very edge of the collapse of meaning. Weaving together real and invented, human and AI elements to seamlessly blur the lines between them, Boucher exploits this chaos to create new semiotic spaces for radical meaning-making. Structurally, the work appropriates, satirizes, and detourns the forms and tropes of conspiracy theory, re-imagining them as a new form of art, and igniting them with the fuel of runaway AI.

While the contents of conspiracy theories often tend toward the ghoulish, harmful, or just plain wrong, they are inherently postmodern, acting as a vehicle for questioning established truths and power relationships—an activity which serves an important social function, if in many cases misguided in its ultimate application. Conspiracy theorists reject grand “official” narratives and instead create their own ad hoc temporary webs of meaning, challenging the legitimacy of the structures we rely on and deep beliefs previously taken for granted. The work asks big questions about whether there could be a way for art to reclaim this function of social critique that conspiracy theories currently embody in the popular consciousness, redirecting it towards more fruitful and creative ends?

The artist’s professional background in content moderation and censorship informs the work, at times borrowing from disinformation techniques observed in the field by state actors, repurposed as storytelling tools in open-ended creative networked narratives, and SEO manipulations to show how easily depictions of “reality” can be twisted and propagated. Misinformation is used here by the artist openly—not to deceive, but to reveal how fragile our systems for defining truth really are. The works expose how the artist’s role as propagandist, deploying “weaponized” artifacts to attempt to subversively actualize or undermine real or potential current or future states.

As a satirist working with the mode of the conspiracy theorist, the artist knowingly inhabits and exaggerates the conspiratorial narrative forms they aim to critique, imploding them from within. As the Onion’s amicus brief on parody put it, “Parodists intentionally inhabit the rhetorical form of their target in order to exaggerate or implode it”—a technique central to this practice.

Inspired by Dada absurdity, the artist’s ‘Nevermades’—collections of AI-generated artworks appearing to involve famous museums and galleries—extend Duchamp’s readymades concept into the post-truth, remote-first digital age, challenging the idea that authenticity requires physical presence – or even actual existence in the first place. These imagined or “aspirational” artworks (like flooding the Guggenheim Museum, and filling it with willow trees and beavers) comment on the art world’s status symbols—galleries, exhibitions, facades and physical artifacts—that can now be artificially fabricated at scale, significantly challenging their value in an online world dominated by images.

The use of AI serves to heighten the inherent tensions in the work. AI is used consciously as both a force that flattens expression into sameness and conformity and as a tool to rebel against the algorithmic culture of likes, shares, and validation – by exploiting and exposing the outliers, anomalies, errors, and vulnerabilities of these technologies. By transparently incorporating AI, the work proudly wears the use of these technologies as a kind of “scarlet letter,” confronting head-on the stigma against its use in creative sectors, and reimagining it as a vehicle and medium all its own for artistic exploration. At the same time, it shines a light on the absurdities and limitations of these technologies, and holds a mirror up to our own evolving reactions to them.

Ultimately, this metamodernist body of work oscillates between the deadly serious and the dangerously stupid and absurd, revealing the fragile and easily manipulated nature of our information systems and the social and political systems which rely on them. The work encourages the audience to consider conspiracy theory as an unrecognized folk art form—provocative and dangerous, to be sure, but one in many ways much like any art or cultural movement that questions authority. It disrupts the established order, challenges accepted facts, and compels us to face the instability of the narratives we hold onto, and, in its best form, opens up the space for change how things are today into how they could be, how we would likek them to be.

AI, misinformation, conspiracy, and hyperreality converge here to ask a simple but potent question: what is real, and who gets to decide?

Curatorial Statement: “Organic Data Weaving”

By Tim B.

On 12 October 2024

In Other

Tim Boucher’s “Organic Data Weaving” seamlessly merges the organic vitality of nature with the abstract logic of digital hyperreality. Woven willow sculptures, embodying the natural profusion of growth, stand alongside AI-generated projections that evolve across the gallery walls. The dynamic interplay between the physicality of willow forms and the insubstantiality of digital projections invites viewers to contemplate the convergence of artificial and organic intelligence.

The woven willow structures reflect the interconnectedness of data networks, echoing the visual representations of data relationships in the projected images. The sculptures’ interlocking patterns and dynamic curves mirror the fluid and shifting nature of data itself, presenting a dialogue between natural growth and the abstract forms of digital information. By juxtaposing these tangible and intangible elements, “Organic Data Weaving” reveals the complex, evolving narrative of our relationship with technology, nature, and the blurred boundaries of hyperreality.

That’s a curatorial statement I had ChatGPT help me write for a recent project of mine, an exploration of what woven willow sculptural forms juxtaposed with AI projected lights and imagery might look like. Photos from the “exhibit” are here.

I’ll pull out a few of my favorites to highlight below.

Without any more context or knowledge about the origins of these images, I would personally be hard-pressed to not take them at face value and believe they were actually cool sculptures which exist somewhere, or did at one time.

But in actual fact, they are nevermades which exist in a hyperreality adjacent to ours. They are aspirational image explorations on a theme, some using Dalle, some Ideogram AI. They are part of a larger experiment in misinformation as art.

But these raise a million other important questions for me as an artist. Namely, if I could essentially simulate a lifetime’s worth of artistic achievements in an evening, and get basically high-quality gallery photos of them as though they were real physical things, where does that leave us existentially relative to actual real physical things? Where does that leave us relative to a lifetime’s worth of artistic achievements?

In a world increasingly centered on the cult of the Almighty Image, and the Almighty Image is continuously exposed as a liar on its own altar at every turn, how are we to proceed?

I saw “real” photos from an art gallery setting in London earlier, and thought to myself, some of these look less high-quality than what I was able to generate with AI. They look literally better than the real thing…

I think that’s hyperreality, is getting sucked down that wormhole, and it’s exactly where we’re stuck now collectively and individually.

Charlie Warzel’s piece in The Atlantic on hurricane disinfo goes down a parallel path in a somewhat different direction, interesting at least here though with our current one:

What is clear is that a new framework is needed to describe this fracturing. Misinformation is too technical, too freighted, and, after almost a decade of Trump, too political.

Hyperreality stands out to me as a relevant and still potentially useful analytical framework that is wider and not so fraught, and which can encompass this idea of the “artist as propagandist” who creates unreal things in order to change or influence real things.

Also from Warzel’s piece:

But as Michael Caulfield, an information researcher at the University of Washington, has argued, “The primary use of ‘misinformation’ is not to change the beliefs of other people at all. Instead, the vast majority of misinformation is offered as a service for people to maintain their beliefs in face of overwhelming evidence to the contrary.”

Interestingly, in other contexts outside of conspiracy fear-mongering, we often refer to be people who can cling to an alternative vision of reality in the face of overwhelming opposition “visionaries,” and we culturally usually cheer them on as they succeed in implementing that vision in actual reality. Unfortunately, an exceedingly great number of such “visionaries” in our day and age have been subsumed by vanity and wealth, and where they might have been or might believe themselves to be luminaries, emit only a kind of sticky darkness…

To me these willow-works, both my IRL ones and my ORL (outside real life?) hyperreal ones, play somewhere in a space that lays orthogonally in opposition to all that. Willow to me is profusion, proof of abundant life, of generous, ridiculously abundant and productive life, of reified embodied living sunlight. The reality of that when you feel it in your hands shatters all false darknesses, and returns us somehow deeply, instinctually, ancestrally, immediately back in tune with the Overwave, the wave from which all other waves are born…

Multi-Concept Addressing for Latent Space Navigation

By Tim B.

On 18 September 2024

In Other

Preface: How I got here

I’ve been hanging out at the library lately, and realized that my Dewey Decimal System (DDS) knowledge has gotten pretty rusty. I found a “concise” summary of it here, and printed off the First, Second, and Third Summaries, which cover The Ten Main Classes, The Hundred Divisions, and the Thousand Sections, respectively.

I knew there was some controversy about the DDS, but I hadn’t checked in on what it was these past couple decades, I guess. But in skimming through the classes, divisions, and sections, it became apparent how lop-sided its distribution of identifying numbers is towards all things European. We see it again and again that it literally marginalizes entire cultures and their achievements by sticking them into “grab bag” left-overs like:

290 Other religions (where 220-280 are all overtly dedicated to Christianity, and 200-219 are no doubt heavily influenced by that tradition)
490 Other languages (420-4880 are all European languages)
890 Other literatures (810-880 are all Euro or American lit)

I won’t bore the non-taxonomically inclined among you by going line by line through The Thousand Sections (though I am strongly tempted to, but that would prolong this preface unnecessarily moreso than already). But some curiosities jump out in that Islam, 297 doesn’t have its own number to itself, but also includes Babaism & Bahai Faith. Likewise, Buddhism is not given its own name let alone its own number. It is a subdivision of 294, Religions of Indic Origin. I don’t know the exact numbers, but based on some initial skimming on Perplexity, it looks like Buddhism + Islam combined in terms of number of followers on Earth is roughly equivalent to that of Christianity globally. But the Dewey Decimal System doesn’t represent these other dimensions of social reality.

Anyway, all that is to say, with an eye to not duplicating the failings of the DDS as a metaphorical jumping off point, couldn’t it be an interesting exercise to come up with some kind of flexible, less judgemental addressing system for navigating high-dimensional latent spaces, such as those you encounter as a user of generative AI models and systems? I’ve already experimented in this direction visually in the past, thinking about how gen AI image creation systems like Midjourney could benefit from some kind of hand-held controller, which would let you rapidly assign and modify dimensional values on the fly, in order to traverse neighborhoods and relations in more or less real time. Latent space as navigable VR, if you will.

I took this problem to three different AI genies, of, paraphrased, give me a dewey decimal system for latent space. The first genie, whose angelic name is ChatGPT-4o, gave me answers that were mildly insightful, but not adequately interesting to pursue in depth. The second genie, whose moniker is Claude, gave me results which were promising, and a UX response that was riddled with errors and hobbled by rate limits. The third genie, named for the Winds brought with it clear thinking, and an in-depth ability to solve the problem through interrogation. That genie’s failing is its refusal to follow custom instructions or the equivalent at a prompt level (“code only, no explanation”), and it’s slow speed. But what it lacked in those areas, it made up for in its ability to guide me to towards a tentatitvely adequate V1/MVP, which is presented here without further ado after this absurdly long preface. My sincerest sorry/not sorry.

Introduction

Disclaimer:

The Multi-Concept Addressing system (MCA) is an attempt by a non-technical author to develop a preliminary schema for one way of potentially addressing locations within latent spaces. It may not prove to be the “best way,” but seemed good enough to at least put out to get the conversation started.

Much of the rest of this text that follows comes directly from Mistral, with light edits from myself.

MCA: The “Dewey Decimal System” for Latent Space in Generative AI

Multi-Concept Addressing, or MCA, is a proposed addressing system designed to navigate and interpret high-dimensional latent spaces in generative AI models. It provides a structured and interpretable way to represent complex scenes and images, much like the Dewey Decimal System organizes information in libraries. (*See: Preface)

Key Components:

Base Concepts: High-level concepts that define the broad categories of elements in a scene.
Sub-Concepts: Detailed information about specific elements within the base concepts.
Relations: Relationships between different concepts, capturing how they interact.
Context: Additional contextual information that provides nuance and depth to the scene.

Operational Principles: MCA operates on membership degrees or intensities (that is, whether an image, for example, contains members of a particular concept, and how much), allowing for precise control over the presence and importance of various concepts and relationships within a scene.

Problem Solved: MCA addresses the challenge of navigating and understanding high-dimensional latent spaces in generative AI models in something that approximates a human-readable format. It provides (hopefully) a holistic and flexible solution that can potentially be adapted to various contexts, including image generation, semantic analysis, and data retrieval.

High-Level Example

Consider the following natural language prompt:

a cat riding a bicycle wearing a football helmet playing a banjo in outer space

This prompt contains multiple concepts and relationships that need to be represented in a structured and interpretable way. Let’s see how MCA can achieve this. [Back to my text with Mistral excerpts included below.]

The first part of an MCA address consists of a string like this representing base concepts and weights:

Ani90Obj80Env90Act80Acc70

Where the name values for high-level (or “base”) concepts represented in this query are:

Ani: Animals
Obj: Objects
Env: Environments
Act: Activities
Acc: Accessories

Using only this for addressing just leaves us in a very fuzzy general vicinity… maybe something like a room in a given library, or a big shelving unit. We might be able to find what we need, but we’re most likely to stumble around looking for it without more specific information.

Base Concepts Sample List (Provisional)

As an aside, I had Mistral work up a set of what might be the top base concept names and abbreviations. I kept saying, do you have any more edge cases, and it kept giving more and more. Eventually I gave up, as this seems like an adequatedly representative step for a v1 of this concept. Here is that list, for completeness (though it also made a list that was much much much longer, and I had to eventually push the stop button. I’ll at least spare you that one). I think there could be better three-letter codes representing each concept, but I left them as the first three letters to make it simple. Here it is:

Animals (Ani)
Objects (Obj)
Environments (Env)
Activities (Act)
Accessories (Acc)
People (Peo)
Plants (Pla)
Structures (Str)
Weather (Wea)
Time (Tim)
Emotions (Emo)
Events (Eve)
Sounds (Sou)
Text (Tex)
Abstract Concepts (Abs)
Technology (Tec)
Food and Drink (Foo)
Transportation (Tra)
Art and Culture (Art)
Natural Phenomena (Nat)
Science and Mathematics (Sci)
Health and Medicine (Hea)
Education (Edu)
Sports (Spo)
Mythology and Folklore (Myth)
Fantasy and Science Fiction (Fan)
Geography (Geo)
History (His)
Lighting (Lig)
Colors (Col)
Textures (Tex)
Movement (Mov)
Interactions (Int)
Symbols (Sym)
Virtual and Digital (Vir)
Celestial Bodies (Cel)
Microorganisms (Mic)
Chemicals (Che)
…and on and on

Navigating Sub-Concepts

Getting back to the addressing for that specific reference prompt results – if we’re looking at all of this in JSON, then the next part of the address will derive from sub-concepts within those broader base concepts which could be represented like this:

{
  "SubConcepts": {
    "Ani": "cat90",
    "Obj": "bike80banjo70",
    "Env": "space90",
    "Act": "ride80play70wear60",
    "Acc": "helm70"
  }
}

So our MCA partial address now has the base concepts, and sub-concepts, which would look something like this, give or take:

Ani90Obj80Env90Act80Acc70::cat90bike80banjo70space90ride80play70wear60helm70

It looks inscrutable-ish, but it’s not really. It’s just a way of compressing the JSON schema into a single line.

Relations & Context

But still in its current form, we don’t necessarily know enough about the different entities and actions to find exactly what we’re looking for. We might get close, but still have major errors. We need to know something more about the relationships between all these entities or values that are named, as well as any larger context not otherwise captured in the address so far. Otherwise our accuracy is going to be pretty low for navigation.

In JSON, these might look like:

  "Relations": [
    "Cat-RidingOn-Bike",
    "Cat-Wearing-Helm",
    "Cat-Playing-Banjo"
  ],
  "Context": [
    "Surreal",
    "Humorous",
    "BrightColors"
  ]

And if we squish that back down into the full MCA multi-concept address, with weights, it might look something like this:

Ani90Obj80Env90Act80Acc70::cat90bike80banjo70space90ride80play70wear60helm70::Cat-RidingOn-Bike_Cat-Wearing-Helm_Cat-Playing-Banjo::Surreal_Humorous_BrightColors

Granted, we could probably also assign weights to the relations and context elements, but I didn’t want to complicate it any more than it already is.

Putting It Together: Full MCA Example Schema

Putting the above extended MCA address all together again as JSON to elaborate its components:

{
  "MCA": "Ani90Obj80Env90Act80Acc70::cat90bike80banjo70space90ride80play70wear60helm70::Cat-RidingOn-Bike_Cat-Wearing-Helm_Cat-Playing-Banjo::Surreal_Humorous_BrightColors",
  "BaseConcepts": "Ani90Obj80Env90Act80Acc70",
  "SubConcepts": {
    "Ani": "cat90",
    "Obj": "bike80banjo70",
    "Env": "space90",
    "Act": "ride80play70wear60",
    "Acc": "helm70"
  },
  "Relations": [
    "Cat-RidingOn-Bike",
    "Cat-Wearing-Helm",
    "Cat-Playing-Banjo"
  ],
  "Context": [
    "Surreal",
    "Humorous",
    "BrightColors"
  ]
}

Is this actually simple and flexible like I had hoped setting out on Today’s AI Side-Quest™? Hard for me to be the judge, but so far it is the only thing of its kind that I have found out there (though I did find some adjacent concepts I’m not yet well-versed enough in these areas to explore in depth), and it does to me at least address a real and specific need, whether or not it, erm, completely accurately gives a reproducible address every time in all situations. It is still maybe a bit vague, but at least hopefully narrows down the task of navigation into a more restrained dimensional space, with keys as to values that could be changed in searching for the specific “shelf” that contains what you are after.

Really, what I imagine in all of this is like a bunch of conceptual characteristics mapped to sliders in a UI, where fully on means that a given characteristic/dimension/concept/tag is applied to the max within the desired outputs. And fully off means that attribute is excluded, plus all the values between. Then using a machine mapped to this would be about playing around with different conceptual sliders to emphasize or de-emphasize members of a given group or groups of high-dimensional characteristics in the latent space.

Phew, lot of words to get out here, but I think that brings us to the end for now, if not the “conclusion.”

Post Script

I am trying to find a natural language UI design tool that can output a version of the above as a simple web app, something to the effect of what I described in this prompt:

app for navigating addresses in latent space based on given values (concepts, subconcepts, relations, context) and their weights. the app consists of sliders paired to specific example attributes or concepts which can be adjusted to yield different results in a viewer window that shows that location

This is somewhat janky, but a quick version of that made using UIzard.io just to leave you with something more concrete to consider:

French Interview with Bruno Guglielminetti of Mon Carnet

By Tim B.

On 30 August 2024

In Other

I had a nice time recording this interview last week with Quebec tech journalist, Bruno Guglielminetti. The finished piece is part of a longer podcast you can listen to here or below via the Soundcloud embed.

This is only my second interview trying to speak about these topics in French, and the first one that is actual audio recording me stumbling through and searching for my words. But I think I at least succeeded in getting my main points across – I hope.