latent space – Tim Boucher

Preface: How I got here

I’ve been hanging out at the library lately, and realized that my Dewey Decimal System (DDS) knowledge has gotten pretty rusty. I found a “concise” summary of it here, and printed off the First, Second, and Third Summaries, which cover The Ten Main Classes, The Hundred Divisions, and the Thousand Sections, respectively.

I knew there was some controversy about the DDS, but I hadn’t checked in on what it was these past couple decades, I guess. But in skimming through the classes, divisions, and sections, it became apparent how lop-sided its distribution of identifying numbers is towards all things European. We see it again and again that it literally marginalizes entire cultures and their achievements by sticking them into “grab bag” left-overs like:

290 Other religions (where 220-280 are all overtly dedicated to Christianity, and 200-219 are no doubt heavily influenced by that tradition)
490 Other languages (420-4880 are all European languages)
890 Other literatures (810-880 are all Euro or American lit)

I won’t bore the non-taxonomically inclined among you by going line by line through The Thousand Sections (though I am strongly tempted to, but that would prolong this preface unnecessarily moreso than already). But some curiosities jump out in that Islam, 297 doesn’t have its own number to itself, but also includes Babaism & Bahai Faith. Likewise, Buddhism is not given its own name let alone its own number. It is a subdivision of 294, Religions of Indic Origin. I don’t know the exact numbers, but based on some initial skimming on Perplexity, it looks like Buddhism + Islam combined in terms of number of followers on Earth is roughly equivalent to that of Christianity globally. But the Dewey Decimal System doesn’t represent these other dimensions of social reality.

Anyway, all that is to say, with an eye to not duplicating the failings of the DDS as a metaphorical jumping off point, couldn’t it be an interesting exercise to come up with some kind of flexible, less judgemental addressing system for navigating high-dimensional latent spaces, such as those you encounter as a user of generative AI models and systems? I’ve already experimented in this direction visually in the past, thinking about how gen AI image creation systems like Midjourney could benefit from some kind of hand-held controller, which would let you rapidly assign and modify dimensional values on the fly, in order to traverse neighborhoods and relations in more or less real time. Latent space as navigable VR, if you will.

I took this problem to three different AI genies, of, paraphrased, give me a dewey decimal system for latent space. The first genie, whose angelic name is ChatGPT-4o, gave me answers that were mildly insightful, but not adequately interesting to pursue in depth. The second genie, whose moniker is Claude, gave me results which were promising, and a UX response that was riddled with errors and hobbled by rate limits. The third genie, named for the Winds brought with it clear thinking, and an in-depth ability to solve the problem through interrogation. That genie’s failing is its refusal to follow custom instructions or the equivalent at a prompt level (“code only, no explanation”), and it’s slow speed. But what it lacked in those areas, it made up for in its ability to guide me to towards a tentatitvely adequate V1/MVP, which is presented here without further ado after this absurdly long preface. My sincerest sorry/not sorry.

Introduction

Disclaimer:

The Multi-Concept Addressing system (MCA) is an attempt by a non-technical author to develop a preliminary schema for one way of potentially addressing locations within latent spaces. It may not prove to be the “best way,” but seemed good enough to at least put out to get the conversation started.

Much of the rest of this text that follows comes directly from Mistral, with light edits from myself.

MCA: The “Dewey Decimal System” for Latent Space in Generative AI

Multi-Concept Addressing, or MCA, is a proposed addressing system designed to navigate and interpret high-dimensional latent spaces in generative AI models. It provides a structured and interpretable way to represent complex scenes and images, much like the Dewey Decimal System organizes information in libraries. (*See: Preface)

Key Components:

Base Concepts: High-level concepts that define the broad categories of elements in a scene.
Sub-Concepts: Detailed information about specific elements within the base concepts.
Relations: Relationships between different concepts, capturing how they interact.
Context: Additional contextual information that provides nuance and depth to the scene.

Operational Principles: MCA operates on membership degrees or intensities (that is, whether an image, for example, contains members of a particular concept, and how much), allowing for precise control over the presence and importance of various concepts and relationships within a scene.

Problem Solved: MCA addresses the challenge of navigating and understanding high-dimensional latent spaces in generative AI models in something that approximates a human-readable format. It provides (hopefully) a holistic and flexible solution that can potentially be adapted to various contexts, including image generation, semantic analysis, and data retrieval.

High-Level Example

Consider the following natural language prompt:

a cat riding a bicycle wearing a football helmet playing a banjo in outer space

This prompt contains multiple concepts and relationships that need to be represented in a structured and interpretable way. Let’s see how MCA can achieve this. [Back to my text with Mistral excerpts included below.]

The first part of an MCA address consists of a string like this representing base concepts and weights:

Ani90Obj80Env90Act80Acc70

Where the name values for high-level (or “base”) concepts represented in this query are:

Ani: Animals
Obj: Objects
Env: Environments
Act: Activities
Acc: Accessories

Using only this for addressing just leaves us in a very fuzzy general vicinity… maybe something like a room in a given library, or a big shelving unit. We might be able to find what we need, but we’re most likely to stumble around looking for it without more specific information.

Base Concepts Sample List (Provisional)

As an aside, I had Mistral work up a set of what might be the top base concept names and abbreviations. I kept saying, do you have any more edge cases, and it kept giving more and more. Eventually I gave up, as this seems like an adequatedly representative step for a v1 of this concept. Here is that list, for completeness (though it also made a list that was much much much longer, and I had to eventually push the stop button. I’ll at least spare you that one). I think there could be better three-letter codes representing each concept, but I left them as the first three letters to make it simple. Here it is:

Animals (Ani)
Objects (Obj)
Environments (Env)
Activities (Act)
Accessories (Acc)
People (Peo)
Plants (Pla)
Structures (Str)
Weather (Wea)
Time (Tim)
Emotions (Emo)
Events (Eve)
Sounds (Sou)
Text (Tex)
Abstract Concepts (Abs)
Technology (Tec)
Food and Drink (Foo)
Transportation (Tra)
Art and Culture (Art)
Natural Phenomena (Nat)
Science and Mathematics (Sci)
Health and Medicine (Hea)
Education (Edu)
Sports (Spo)
Mythology and Folklore (Myth)
Fantasy and Science Fiction (Fan)
Geography (Geo)
History (His)
Lighting (Lig)
Colors (Col)
Textures (Tex)
Movement (Mov)
Interactions (Int)
Symbols (Sym)
Virtual and Digital (Vir)
Celestial Bodies (Cel)
Microorganisms (Mic)
Chemicals (Che)
…and on and on

Navigating Sub-Concepts

Getting back to the addressing for that specific reference prompt results – if we’re looking at all of this in JSON, then the next part of the address will derive from sub-concepts within those broader base concepts which could be represented like this:

{
  "SubConcepts": {
    "Ani": "cat90",
    "Obj": "bike80banjo70",
    "Env": "space90",
    "Act": "ride80play70wear60",
    "Acc": "helm70"
  }
}

So our MCA partial address now has the base concepts, and sub-concepts, which would look something like this, give or take:

Ani90Obj80Env90Act80Acc70::cat90bike80banjo70space90ride80play70wear60helm70

It looks inscrutable-ish, but it’s not really. It’s just a way of compressing the JSON schema into a single line.

Relations & Context

But still in its current form, we don’t necessarily know enough about the different entities and actions to find exactly what we’re looking for. We might get close, but still have major errors. We need to know something more about the relationships between all these entities or values that are named, as well as any larger context not otherwise captured in the address so far. Otherwise our accuracy is going to be pretty low for navigation.

In JSON, these might look like:

  "Relations": [
    "Cat-RidingOn-Bike",
    "Cat-Wearing-Helm",
    "Cat-Playing-Banjo"
  ],
  "Context": [
    "Surreal",
    "Humorous",
    "BrightColors"
  ]

And if we squish that back down into the full MCA multi-concept address, with weights, it might look something like this:

Ani90Obj80Env90Act80Acc70::cat90bike80banjo70space90ride80play70wear60helm70::Cat-RidingOn-Bike_Cat-Wearing-Helm_Cat-Playing-Banjo::Surreal_Humorous_BrightColors

Granted, we could probably also assign weights to the relations and context elements, but I didn’t want to complicate it any more than it already is.

Putting It Together: Full MCA Example Schema

Putting the above extended MCA address all together again as JSON to elaborate its components:

{
  "MCA": "Ani90Obj80Env90Act80Acc70::cat90bike80banjo70space90ride80play70wear60helm70::Cat-RidingOn-Bike_Cat-Wearing-Helm_Cat-Playing-Banjo::Surreal_Humorous_BrightColors",
  "BaseConcepts": "Ani90Obj80Env90Act80Acc70",
  "SubConcepts": {
    "Ani": "cat90",
    "Obj": "bike80banjo70",
    "Env": "space90",
    "Act": "ride80play70wear60",
    "Acc": "helm70"
  },
  "Relations": [
    "Cat-RidingOn-Bike",
    "Cat-Wearing-Helm",
    "Cat-Playing-Banjo"
  ],
  "Context": [
    "Surreal",
    "Humorous",
    "BrightColors"
  ]
}

Is this actually simple and flexible like I had hoped setting out on Today’s AI Side-Quest™? Hard for me to be the judge, but so far it is the only thing of its kind that I have found out there (though I did find some adjacent concepts I’m not yet well-versed enough in these areas to explore in depth), and it does to me at least address a real and specific need, whether or not it, erm, completely accurately gives a reproducible address every time in all situations. It is still maybe a bit vague, but at least hopefully narrows down the task of navigation into a more restrained dimensional space, with keys as to values that could be changed in searching for the specific “shelf” that contains what you are after.

Really, what I imagine in all of this is like a bunch of conceptual characteristics mapped to sliders in a UI, where fully on means that a given characteristic/dimension/concept/tag is applied to the max within the desired outputs. And fully off means that attribute is excluded, plus all the values between. Then using a machine mapped to this would be about playing around with different conceptual sliders to emphasize or de-emphasize members of a given group or groups of high-dimensional characteristics in the latent space.

Phew, lot of words to get out here, but I think that brings us to the end for now, if not the “conclusion.”

Post Script

I am trying to find a natural language UI design tool that can output a version of the above as a simple web app, something to the effect of what I described in this prompt:

app for navigating addresses in latent space based on given values (concepts, subconcepts, relations, context) and their weights. the app consists of sliders paired to specific example attributes or concepts which can be adjusted to yield different results in a viewer window that shows that location

This is somewhat janky, but a quick version of that made using UIzard.io just to leave you with something more concrete to consider:

Questionable content, possibly linked

Tag: latent space

Multi-Concept Addressing for Latent Space Navigation