Questionable content, possibly linked

Category: Other Page 59 of 177

Jaron Lanier on right to be left alone

I got kinda lost and didn’t finish reading this New Yorker article by Jaron Lanier, but I like him in general and wanted to capture this passage for further digestion:

For years, I worked on the E.U.’s privacy policies, and I came to realize that we don’t know what privacy is. It’s a term we use every day, and it can make sense in context, but we can’t nail it down well enough to generalize. The closest we have come to a definition of privacy is probably “the right to be left alone,” but that seems quaint in an age when we are constantly dependent on digital services. In the context of A.I., “the right to not be manipulated by computation” seems almost correct, but doesn’t quite say everything we’d like it to.

This question of “rights” becomes an extremely tough one when you ask who exactly would enforce such a right, to, for example, not be manipulated by computation? How exactly would it work, when we’re swimming in a sea of things that are constantly computationally manipulating us – even down to the level of the design of physical products and infrastructure we rely on. All of it, the whole system, the whole complex, manipulates us into the shape it wants. Everything gardens.

As the Unabomber tried and proved, you can’t simply go into the woods and live in an austere cabin and be left alone anymore. Society comes to find you. Other people manipulate the landscape around you. Who then can enforce such a supposed right to be “left alone?”

And as a corollary, in such a world, what can true freedom look like any more? What refuges are left to us and how can we access them?

Is the right to not be computationally manipulated inclusive of say, the ability to opt out of being required to us computers or internet-based services in educational, professional, and other environments? Like, can we get it to mean that analog & human-first alternatives must always be available? In other words, the right to use a rotary phone? Or, what about the right to not have to be distracted by devices, or by constant interruptions, updates, and notifications? If we’re going to innovate in this direction, let’s go all the way… All the way where?

Notes on The Return of the Magicians

The Return of the Magicians is book #93 of the AI Lore books series, by Lost Books, a Canadian AI publisher.

Written using Claude, illustrated with Midjourney v5.

I’m really happy with this book. It’s pure “old school” Quatriana of the best kind, and Claude did a fantastic job executing stories in the narrative space I set up for it. In fact, I consider this basically “canonical” Quatria lore at this point, with the exception perhaps of the queen chapter.

I arranged and executed on this volume a bit differently. I had my initial set of images, many of which were based off other images I found and used /describe on. If you don’t know, that command does a sort of description of the input image, then lets you generated based on those descriptions. But they never come out looking basically remotely the same as what you put in. They will always share significant elements, but the look is always different.

Anyway, from there, I ended up with a few different clusters of related images, which I dumped into each their own chapter. Then I used Claude to generate stories in my narrative space which made more explicit use of the visual themes of the images. I think it did a great job, and possibly ends up feeling less scattershot than how I’ve worked with images in the past.

Also definitely one of the most upbeat of all the AI books I’ve put out so far, as the Return of the Magicians in the context of Quatrian lore is a joyous event that heralds a new Golden Age. If only!

“Some of my best friends are pelicans”

New Trump pelican image set (Midjourney v5)

Kevin Kelly on Generative AI & Dreams Being the Default Mode

I’ve read and watched a ton on generative AI and what it’s all about, and I’d place Kevin Kelly’s article, “Dreams are the default for intelligence,” at or near the top of this list. It’s the most exciting and adventurous hot take on these topics I’ve seen.

Some choice excerpts:

I have a proto-theory: That our brains tend to produce dreams at all times, and that during waking hours, our brains tame the dream machine into perception and truthiness. At night, we let it run free to keep the brain areas occupied. The foundational mode of the intelligence is therefore dreaming.

And:

At any length, the AI stuff feels like dreams.

My conjecture is that they feel like dreams because our heads are using the same methods, the same algorithms. so to speak. Our minds, of course, are using wet neurons, in much greater numbers and connections than a GPU cluster, but algorithmically, they will be doing similar things.

It is possible that this whole apparatus of generation is actually required for perception itself. The “prompt” in ordinary sight may be the stream of data bits from the optic nerve in the eye balls, which go on to generate the “vision” of what we see. The same algorithms which generate the hallucinations for AI art — and for human dreams — may also be the heavy-duty mechanisms that we use to perceive (vs just “see”.) If that were so, then we’d need additional mechanisms to tamp down and tame the innate tendency for our visual system to hallucinate. That mechanism might be the constant source of data from our senses, which keeps correcting the dream engine, like a steady stream of prompts.

And:

During waking moments with the full river of data from all our senses, plus the oversight our conscious attention, the tendency of the generative engine to hallucinate is kept in check. But during the night, when the prompting from the senses diminish, the dreams take over with a different kind of prompt, which may simply be the points where our subconscious is paying attention.

Last one:

I suggest that the default state of this engine is to dream, and that it is managed during the day to not hallucinate. To dream, then, is not a higher order function, but the most primeval one, that is only refined by more sophisticated function that align it with reality.

He goes into some I think unnecessary stuff about eyeballs that I think is safe to exclude from the conversation, but the whole piece is fascinating regardless.

One thing he does not seem to entertain the possibility of here though: that dreaming itself is aligned with reality, in that reality fundamentally has dreamlike qualities.

There’s also a worthwhile Aldous Huxley reference here to the notion of the brain as reducing valve for ‘Mind at Large.’

Each person is at each moment capable of remembering all that has ever happened to him and of perceiving everything that is happening everywhere in the universe. The function of the brain and nervous system is to protect us from being overwhelmed and confused by this mass of largely useless and irrelevant knowledge, by shutting out most of what we should otherwise perceive or remember at any moment, and leaving only that very small and special selection which is likely to be practically useful. According to such a theory, each one of us is potentially Mind at Large.

The Doors of Perception (Huxley)

I don’t see any reason it can’t be both: a filtering mechanism from the universal consciousness down to this specific one I’m inhabiting, and a generative engine to create experiential states based on sensory, perceptual, emotional, and cognitive inputs, etc.

I have more to say on this, but wanted to first set this post as an anchor on its own to refer back to.

Artist’s Statement

Hyperrealities and Uncanny Valleys: Provoking Dialogue through AI-Generated Imagery

(a.k.a. “I dream in image generations”)

Written with help from ChatGPT (v4)


Having worked for years in content moderation, my artistic practice delves into issues around the realm of AI image and text generation: embarking on (mid)journeys that explore the boundaries of photorealistic and hyperrealistic depictions of unreal events. By exploring speculative alternative realities, I construct visual narratives that portray political and other figures in hypothetical scenarios that have never unfolded within our objective consensus reality. Through this process, I aim to evoke curiosity and critical thinking among viewers, encouraging them to closely examine and question the information presented to them.

Humor and absurdity intertwine with my work, questioning the dichotomy between good and bad content and challenging societal limits through satire and parody. I firmly believe that the power of image making extends beyond aesthetics, potentially serving as a catalyst for profound societal discussions and inspiring real political change. Thus, I advocate for inclusive access and development, as well as encouraging diverse ownership and direction of these tools.

Within the context of postmodernism, my work ventures into the realm of hyperreality, blurring the boundaries between the real and the imagined. Deliberately engaging with the concept of the uncanny valley, AI-generated images can elicit a sense of uncertainty and discomfort. This unease prompts us to question the authenticity and reliability of what we perceive, particularly the information presented by those in positions of authority. It encourages a deeper examination of the potential consequences and implications of AI-generated content.

As a society, we must recognize that the challenges posed by AI-generated content and potential misuses cannot be solved by a single technical fix, nor should we fall into the trap of relying on AI to tell us what is “real.” Instead, AI-generated content represents an ongoing condition that requires continuous adaptation and proactive efforts to address issues and improve. Rather than advocating for the cessation of these technologies, I believe that navigating this landscape requires continuous critical thinking, adaptation, and open dialogue. This approach fosters a more informed and discerning approach to the consumption and interpretation of AI-generated images, text, and other content.

My work aims to provoke thought, spark conversations, and contribute to the larger discourse surrounding the impact and implications of AI-generated content in our increasingly interconnected world. Everyone needs to have their say in this conversation so we can develop a more nuanced understanding of the role and influence of AI in our society.

Thinware & Thickware: Surfing the Holomorphic Continuum

Preface

This is something I came up with years ago living in a basement in Seattle (c. 2005?). I never had any name for the concept between “thinware” (software) and “thickware” (hardware), and the ability to sort of fluidly flip back and forth between the two. Seemed like a great science fiction concept, if nothing else. But how many sci fi conceptions have actually inspired real inventions? Tons! It’s almost like someone has to first imagine it, and then it can be built in our reality.

I used ChatGPT (v4) to help me come up with a name and better explanation for the whole idea package. Here it is.

What is the Holomorphic Continuum?

The Holomorphic Continuum is an innovative concept that reimagines the boundaries between the physical and digital worlds, challenging the traditional separation between hardware and software components in technology. This continuum represents a fluid spectrum, where hardware (thickware) and software (thinware) can coexist, interact, and transform into one another, effectively dissolving the barriers that have historically defined their roles and functions.

As technology advances, the need for more integrated and adaptive systems becomes increasingly apparent. The Holomorphic Continuum offers a framework for developing such systems by fostering a more unified and dynamic understanding of the relationship between hardware and software. By recognizing their potential for seamless interaction, this concept encourages the exploration of new possibilities in technology, extending beyond the limitations of current paradigms.

Within the context of the Holomorphic Continuum, an array of transformative technologies can be conceived, giving rise to novel applications and user experiences. One such example is the HoloMorphic System, a cutting-edge approach to the integration of hardware and software components, designed to facilitate greater adaptability, efficiency, and intuitive interaction.

Understanding Holomorphic Systems

A HoloMorphic System represents a novel approach to the interaction between hardware and software components in technology. The term “HoloMorphic” is derived from the words “holistic” and “morph,” suggesting a comprehensive and transformative relationship between the two domains. In this system, hardware and software exist along a fluid continuum, enabling seamless transitions between physical and digital states.

Key features of the HoloMorphic System include:

  1. Dynamic State Transitions: Components within the system can transition between thickware (hardware) and thinware (software) states through processes of instantiation (thickening) and de-instantiation (thinning). This allows for flexible adaptation to different tasks and requirements.
  2. Seamless Integration: The HoloMorphic System promotes a high degree of interoperability between hardware and software components, enabling more efficient communication and collaboration between different technological elements.
  3. Adaptive Functionality: The fluid nature of the HoloMorphic System allows for the development of adaptive technologies that can automatically adjust their form and function to better suit specific tasks or contexts.
  4. Enhanced User Experience: By blurring the lines between hardware and software, the HoloMorphic System has the potential to create more immersive and intuitive user experiences, allowing users to interact with technology in a more natural and seamless manner.

Possible applications for HoloMorphic Systems could range from advanced robotics, where physical components can morph to adapt to various tasks, to immersive virtual and augmented reality environments, where digital elements can be instantiated into physical forms to create more interactive experiences.

As a cutting-edge concept, the HoloMorphic System challenges traditional distinctions between hardware and software, opening the door to new possibilities in technological innovation and human-computer interaction.

More thoughts on Midjourney / Latent Space navigation

Did a couple posts on this already and wanted to continue the train of thought on the idea of handheld controllers for navigating latent spaces of image diffusion models like Midjourney (though theoretically the same concepts should apply for navigating any high dimensional space).

When this idea first landed, I worked extensively with ChatGPT v4 to flesh the concept out in a bit more detail, but I never captured that anywhere public. So here it is…

Core design challenges

After sketching out the general idea, so we were both on the same page. I asked CGPT what are the core design challenges here. I won’t include every little piece of the response, but selected highlights are below.

The core design challenge in this project is indeed interpreting a high-dimensional latent space into a physically navigable form using a handheld controller with limited input options. Mapping the complexity of a multi-dimensional latent space onto a device with a restricted number of controls, while maintaining a smooth and intuitive user experience, can be difficult.

Some specific challenges include:

  1. Dimensionality reduction: Latent spaces of image diffusion models can have hundreds of dimensions. The controller needs to facilitate navigation through these dimensions effectively while only using a few input controls (joysticks, dials, and buttons). Developing a suitable method for dimensionality reduction, such as PCA or t-SNE, that retains meaningful features and provides an intuitive navigation experience, is essential.
  2. Interpolation and navigation: Designing a system that enables smooth transitions between images while navigating the latent space is challenging. Interpolation techniques, such as linear or spherical interpolation, must be carefully chosen and optimized for the best results.

I’m not actually sure what all that means, but following those trails lead to some interesting and useful discoveries. I won’t bother trying to explain it and will instead pass the baton to CGPT.

Dimensionality Reduction

ChatGPT says on this topic:

Dimensionality reduction is a technique used to reduce the number of variables or dimensions in a dataset while preserving the relationships and structures within the data. In the context of your project, dimensionality reduction would be used to simplify the high-dimensional latent space of image diffusion models into a lower-dimensional representation that can be effectively controlled and navigated using the handheld controller.

PCA (Principal Component Analysis) and t-SNE (t-Distributed Stochastic Neighbor Embedding) are two common dimensionality reduction techniques.

PCA is a linear technique that identifies the directions (principal components) in the high-dimensional space where the variance is maximized. By selecting the top principal components, you can project the high-dimensional data onto a lower-dimensional space while preserving most of the data’s structure.

t-SNE is a nonlinear technique that aims to maintain the local structure of the high-dimensional data in the lower-dimensional space. It’s particularly useful for visualizing high-dimensional data in a 2D or 3D space, as it tends to cluster similar data points together.

In your project, you would need to develop a suitable method for dimensionality reduction that:

  1. Retains meaningful features: The chosen technique should preserve important relationships and structures in the latent space, so that the reduced representation remains meaningful and allows for intuitive navigation.
  2. Provides an intuitive navigation experience: The reduced representation should be easy to navigate with the handheld controller. Users should be able to explore the latent space and understand the relationship between different points in this space.

Are there any existing controllers for high dimensional spaces?

Before spending a ton of time reinventing the wheel here, I asked if it knew of any existing controllers which could already handle this kind of high dimensional navigation. It gave these ideas:

While there isn’t a specific controller designed solely for navigating high-dimensional spaces, several existing devices can be adapted or customized for this purpose. Some examples include:

  1. 3Dconnexion SpaceMouse: This 3D controller is designed primarily for navigating 3D environments in CAD, modeling, and design software. It features a pressure-sensitive puck that can be pushed, pulled, twisted, or tilted to control movement in up to six degrees of freedom (X, Y, Z, pitch, roll, and yaw). Although it was not designed specifically for high-dimensional latent spaces, it could potentially be adapted for such a purpose with custom software.
  2. MIDI controllers: MIDI controllers are used to control music production software and instruments, but they often feature a variety of knobs, sliders, and buttons that can be mapped to different parameters. These controllers could potentially be adapted for navigating high-dimensional spaces by assigning each control to a specific dimension or function.

The SpaceMouse actually looks pretty cool for 3D work like CAD:

And I saw elsewhere those buttons are all assignable to keyboard shortcuts, so perhaps there’s something there… I’ve also experimented with using small MIDI keyboards as custom macro controllers in the past (basically just for an assignable bank of buttons), and found that to be pretty useful too. So there might be something there, and I find this idea that music itself might just be a high dimensional space to be pretty intriguing.

User experience moving through High-D space

I then went in the direction of trying to nail down more specifics about what the UX might be like as you move from point to point within a latent or high dimensional space. Here’s excerpts around that:

Navigating through a latent space using a physical controller can be an immersive and engaging experience for the user. Here’s a step-by-step walkthrough of what the user experience might be like:

  1. Initial setup: The user connects the handheld controller to the software interface running on their computer. The software interface presents a visual representation of the latent space, with the current position highlighted or centered.
  2. Starting point: The user begins at a specific point in the latent space, which corresponds to a unique image generated by the image diffusion model.
  3. Exploration: Using the handheld controller’s inputs (joysticks, dials, and buttons), the user navigates through the latent space by moving along different dimensions. As they move, the software interface updates the displayed image in real-time, providing a fluid and responsive experience.
  4. Discovery: As the user navigates through the latent space, they will encounter images with varying degrees of similarity to their starting point. The user can find interesting, unique, or aesthetically pleasing images during their exploration.
  5. Adjustments: The user can use the controller’s dials to adjust settings like interpolation speed, image resolution, or level of detail. These adjustments can help fine-tune the navigation experience, making it more enjoyable and tailored to the user’s preferences.
  6. Saving and organization: If the user discovers an interesting image, they can press a dedicated button on the controller to save it to their image library. The software interface can include organizational tools, like folders or tags, to help users manage their saved images.
  7. Model or category switching: The user can press a button on the controller to toggle between different image diffusion models or categories, allowing them to explore various aspects of the latent space.

I’ll come back to this notion of categories in a bit.

Mapping Latent Space

First, I proposed that there should be some kind of map that gets generated from explorations, so you can see and reproduce the track you took during a session (or across sessions).

Adding a map that visualizes the user’s path of travel and marks saved images as points of interest would further enhance the user experience and provide additional context for their exploration of the latent space. Here’s how the map feature could work:

  1. Display: The software interface could include a separate panel or window displaying the map. The map can be a 2D or 3D representation of the reduced-dimensional latent space.
  2. Path visualization: As the user navigates through the latent space, the map can show their path of travel as a continuous line or a series of connected points. This provides a visual reference for the user’s exploration and can help them understand the relationships between different areas of the latent space.
  3. Saved points: When the user saves an image, the corresponding point in the latent space is marked on the map. This can be done using different symbols, colors, or icons to indicate the saved points. The user can hover over or click on these points to see a thumbnail of the saved image and access additional information or options (e.g., renaming, adding tags, or opening the full-resolution image).
  4. Zoom and pan: The user can zoom in or out and pan around the map to get a better view of their path and the saved points. This allows them to easily navigate and manage their saved images in the context of their exploration.
  5. History navigation: The map can also serve as a way to navigate through the user’s exploration history. By clicking on points along their path, the user can quickly return to previous locations in the latent space.
  6. Exporting the map: The user can have the option to export the map as an image or a file format compatible with other mapping or visualization tools. This can help them share their exploration with others or analyze their path in more detail.

What the user sees in real time

Got it to nail down more clearly what the user sees in real time as they navigate along dimensions…

When the user navigates from one point to another in the latent space using the physical controller, they will see a continuous and smooth transition of images in real-time on the software interface. The images displayed during this transition will represent intermediate points in the latent space between the starting point and the destination.

The real-time visualization of the user’s movement through the latent space can be achieved using interpolation techniques, such as linear or spherical interpolation. Interpolation allows for a smooth blending of features between the starting and destination images, creating a seamless sequence of images that gradually change as the user moves through the latent space.

As the user navigates, they will see the images evolving, with features from the starting and destination points mixing and merging in different ways. This real-time visual feedback helps the user understand the relationships between different points in the latent space and discover new, interesting, or aesthetically pleasing images during their exploration.

Mode switching & dimension assignment

I’m flying by the seat of my pants here imagining this device, but having used image diffusion models a lot (I’m up to 92 AI art books that I published), I have a pretty strong if still intuitive sense of how it might all work. One thing I suggested was that we’d need a way to rapidly switch controller modes and/or assign dimensions to manipulate on the fly. This might be tricky given that, for instance, even a single pixel could be considered a dimension (depending on the model). Here’s what CGPT offered around this topic:

You’re correct that allowing users to rapidly flip between or reassign different dimensions to manipulate as they travel through the latent space is important for an intuitive and engaging experience. Here are a few potential solutions to achieve this:

  1. Mode switching: Assign a button on the controller to switch between different modes, where each mode corresponds to a specific set of dimensions that the user can manipulate. In each mode, the controller’s inputs (joysticks, dials, etc.) can be reassigned to control the dimensions associated with that mode. The software interface can provide visual feedback indicating the current mode and the dimensions being controlled.
  2. Dynamic dimension assignment: Include touch-sensitive controls or additional buttons on the controller that can be used to cycle through dimensions for each input control. For example, a user could tap a button to cycle the X-axis joystick control through different dimensions, while another button could be used to cycle the Y-axis control. Visual feedback on the software interface can display the currently assigned dimensions.
  3. Custom presets: Allow users to create custom presets that define specific combinations of dimensions to be controlled by the inputs. These presets can be saved and loaded using the controller’s buttons or through the software interface. This enables users to quickly switch between their preferred dimension assignments during navigation.
  4. On-screen interface: Implement an on-screen interface that users can interact with directly using a mouse, touch screen, or another pointing device. The on-screen interface can be used to change the dimensions assigned to the controller’s inputs, providing a visual way to manage dimension assignments.

It also offered voice & gesture controls as additional options, but I don’t want to include either of those in initial design scope. To me those are just secondary ways of accessing primary functions controlled in the app.

What the app UI might consist of

I even got it to output very basic SVG code of the software layout, but it’s not worth sharing. So here’s the text description of what it thinks the app window would contain:

  1. Main window: The main window should be divided into two primary sections, the image preview area and the control panel.
  2. Image preview area: This section occupies the majority of the main window and displays the current image generated based on the user’s position in the latent space. It should be large enough to comfortably view the image and observe the real-time changes as the user navigates through the latent space.
  3. Control panel: The control panel can be located on the side or bottom of the main window and contain various interactive elements for controlling the software and adjusting settings. This panel can include:
    a. Dimension assignment: Include a list or grid of available dimensions, where users can click on a dimension to assign it to a specific controller input. The currently assigned dimensions can be highlighted or marked in some way.
    b. Model or category selection: Provide a dropdown menu or a list of available image diffusion models or categories for the user to choose from.
    c. Presets and modes: Include buttons or a dropdown menu for selecting custom presets or modes that define specific dimension assignments for the controller inputs.
    d. Interpolation settings: Provide sliders or input fields for adjusting interpolation speed, image resolution, or other relevant parameters.
    e. Navigation history: Display a visual timeline or a list of the user’s previous locations in the latent space, allowing them to click on a point to return to that location.
    f. Map view toggle: Include a button or checkbox to toggle the display of the map view, which shows the user’s path of travel and saved points in the latent space.
  4. Map view (optional): If the map view is enabled, it can be displayed as a separate panel or window, either floating or docked within the main window. The map should show the user’s path of travel, marked saved points, and allow for zooming and panning.

Agile user stories

I wanted to see just how far I could push it to do the job of the product manager, so I had it write up some Agile user stories describing the software product only:

  1. As a user, I want to see a real-time preview of the image generated based on my position in the latent space, so that I can observe the changes as I navigate.
  2. As a user, I want to assign different dimensions to my controller inputs, so that I can easily switch between controlling various dimensions in the latent space.
  3. As a user, I want to choose between different image diffusion models or categories, so that I can explore various aspects of the latent space.
  4. As a user, I want to save interesting images I discover during my exploration, so that I can revisit or use them later.
  5. As a user, I want to adjust interpolation settings like speed, image resolution, and level of detail, so that I can fine-tune my navigation experience.
  6. As a user, I want to view a map of my path of travel through the latent space, so that I can better understand the relationships between different areas and locate saved images.
  7. As a user, I want to access my navigation history, so that I can quickly return to previous locations in the latent space.
  8. As a user, I want to create custom presets for dimension assignments, so that I can quickly switch between my preferred combinations during navigation.

There are obviously technical issues to work out here, but as a PM reading those, they actually feel like something that could probably be built with an engineering team who has suitable domain expertise.

Categories

Since it brought up ‘categories’ again, I finally inquired what it meant by that, and it’s very useful:

In the context of item 3, categories refer to different groups or subsets within the latent space that are associated with specific types of images, visual features, or content. These categories could be derived from the underlying image diffusion model, which may have been trained on various classes of images, or they could be based on user-defined groupings.

For example, if the image diffusion model was trained on a diverse dataset containing images of animals, landscapes, and buildings, the categories could be “Animals,” “Landscapes,” and “Buildings.” By selecting a category, the user would be able to focus their exploration within the latent space on images that correspond to that specific category.

Alternatively, users may create their own custom categories based on their interests or preferences. These custom categories could be defined by selecting and tagging images during the exploration, effectively creating a user-defined grouping within the latent space.

Allowing users to choose between different categories helps them to more efficiently explore the latent space, as they can focus on specific types of images or visual features that are of interest to them.

Visit adjacent neighborhoods

One thing I always want when I explore in Midjourney especially is, if I find a good area within the latent space, I want to be able to easily and effectively scan adjacent neighborhoods. Here’s what CGPT says on how to potentially do that:

Local exploration mode: When the user enables local exploration mode, they will be presented with multiple smaller image previews that represent the adjacent neighborhoods in the latent space. These previews can be arranged in a grid or circular pattern around the current position.

Neighborhood selection: The user can click on any of the smaller image previews to instantly move to that adjacent neighborhood in the latent space. This will update the main image preview to reflect the new position and generate a new set of adjacent neighborhood previews.

Adjusting proximity: Provide a slider or input field in the software interface that allows the user to adjust the “proximity” of the adjacent neighborhoods. A smaller proximity value will result in neighborhoods that are closer to the current position in the latent space, while a larger value will display neighborhoods that are farther away. This enables users to fine-tune the scope of their local exploration.

While I’m at it, the idea of actually physically mapping out a high dimensional space like this is not so crazy, it seems. I found this Google Arts & Culture experiment, which they’re calling a t-SNE map. It’s hard to do justice to this visualization in only 2D, but here’s a screenshot:

That’s a pretty zoomed out view of a landscape of visually similar images. Here’s a more zoomed in look:

The best thing is really to try it out on the website, but it seems like their sidebar items in the guided basically correspond to what CGPT describes as ‘categories.’

What’m envisioning here is even more fluid and for lack of a better word “VR,” in that not only could you see the specific images like you can above, but it would fluidly move between them with attributes (dimensions) receding or increasing as you travel.

Other modes of use

I asked about what other modes users might want in addition to local exploration, described earlier. It said:

  1. Global exploration mode: This mode allows users to explore the entire latent space more freely, without being restricted to a specific category or local area. Users can navigate in any direction, discovering a wide variety of images and visual features along the way.
  2. Category-focused mode: As mentioned earlier, this mode lets users focus on a specific category within the latent space, such as animals, landscapes, buildings, or user-defined categories. The navigation and exploration will be tailored to that specific category, making it easier for users to find images that match their interests.
  3. Favorites mode: In this mode, users can revisit their saved images and explore the neighborhoods around those points. This can help users to discover new images that are similar to their favorites or to fine-tune their saved images by navigating the local areas around them.
  4. Guided tour mode: This mode provides a pre-defined or user-generated sequence of points in the latent space that users can navigate through. The guided tour can be based on specific themes, popular locations, or user-created paths. This mode can be useful for users who are new to the latent space exploration or who want to follow a curated experience.
  5. Random walk mode: In this mode, the software automatically generates a random path through the latent space, with the user being able to influence the direction or speed of the movement. This can create a more relaxed and serendipitous exploration experience, where users can discover unexpected images and visual features.
  6. Search mode: If the underlying image diffusion model supports it, users can search for specific images or visual features within the latent space using keywords or example images. In this mode, the software would display search results as points in the latent space, which users can then navigate to and explore.
  7. Collaborative mode: This mode allows multiple users to explore the latent space together, either by sharing a single exploration session or by viewing and interacting with each other’s paths and saved images. This can create a more social and engaging exploration experience, with users being able to share discoveries and collaborate on finding interesting images.

Referent-based mode

As I explained to CGPT:

i keep visualizing the ability to like define a group of referents, points or features or whatever in the latent space (like a blue ball, an elephant, a storm, etc) and then… something… explore themes around those but with different configurations or treatments. does that make sense?

And it replied, partial:

What you’re describing is a mode that allows users to define a group of referent points, features, or themes in the latent space and then explore variations and relationships between those referents. This can help users discover new and interesting images that share similarities with the chosen referents but have different configurations, styles, or treatments.

Here’s a potential approach to implementing this idea:

  1. Referent selection: Provide a user interface element (e.g., a list or a grid) where users can add, remove, or modify referent points, features, or themes. Users can either manually select these referents by navigating the latent space or use a search functionality if the underlying image diffusion model supports it.
  2. Theme exploration: Once the referents are defined, the software generates a set of variations or combinations based on the selected referents. These variations can be displayed as a grid of image previews or a more interactive visualization like a 2D or 3D scatter plot.

As it went on (and I clipped some less useful parts), I got to wondering, isn’t this just the same thing as simple prompt mode? How would this be different than just entering words for the things I want as my referents?

The concept you described earlier is different from a simple prompt mode. Both modes are aimed at guiding users to explore specific themes or concepts in the latent space, but they approach this goal in different ways.

In a simple prompt mode, users input a textual prompt (e.g., a keyword or a phrase), and the software generates a single image or a small set of images related to the prompt. The exploration in prompt mode is more linear, focusing on the specific concept described by the prompt. Users can adjust the prompt to refine the resulting images or explore other concepts, but the focus remains on a single theme at a time.

On the other hand, the referent-based mode you described earlier involves defining a group of referent points, features, or themes in the latent space and then exploring variations and relationships between those referents. The exploration in this mode is more dynamic and interactive, allowing users to discover new and interesting images that share similarities with multiple referents, while presenting different configurations, styles, or treatments.

In the referent-based mode, users have more control over the exploration process, as they can adjust parameters, add or remove referents, and interact with the generated variations in a more flexible way. This mode also promotes a deeper understanding of the relationships and connections between different areas of the latent space, as users can visually observe and manipulate the variations and combinations of referents.

So I think this distinction might be like… text prompts could be used to get me to a particular place in the latent space (or to modify what I’m seeing already). Whereas, a referent-mode might allow me to – as I’m exploring – say select certain entities that I see within the visualization… and maybe add them to a basket or something that I can take along with me as I go. And those referents wouuld show up or stick with me as I continue navigating the space, until I choose to release them. Something like that.

Short list of required app features

So, from there I had it go back through and distill all of this into a list of required features for the app:

Here is a short list of required features for the app, based on the discussions so far:

  1. Real-time image preview: Display the generated image based on the user’s current position in the latent space, updating in real-time as the user navigates.
  2. Dimension assignment: Allow users to assign different dimensions of the latent space to specific inputs or controls, enabling them to easily switch between controlling various dimensions.
  3. Category selection: Provide the ability to choose between different categories or subsets within the latent space to focus the exploration on specific types of images or content.
  4. Save images: Enable users to save interesting images they discover during exploration for future reference or use.
  5. Interpolation settings: Allow users to adjust settings like interpolation speed, image resolution, and level of detail to fine-tune their navigation experience.
  6. Exploration modes:
    a. Local exploration mode
    b. Global exploration mode
    c. Category-focused mode
    d. Favorites mode
    e. Guided tour mode
    f. Random walk mode
    g. Search mode (if supported by the image diffusion model)
    h. Collaborative mode
    i. Referent-based mode
  7. Map view: Display a map of the user’s path of travel through the latent space, with the ability to show saved points, referent points, and other user-defined landmarks.
  8. Navigation history: Provide access to the user’s navigation history, allowing them to quickly return to previous locations in the latent space.
  9. Custom presets: Allow users to create and manage custom presets for dimension assignments, making it easy to switch between preferred combinations during navigation.
  10. Adjustable proximity for local exploration: Provide a slider or input field to adjust the proximity of adjacent neighborhoods during local exploration.
  11. Help and tutorials: Offer in-app help and tutorials to assist users in understanding and utilizing the various features and exploration modes.

Minimal feature set for a v1

So from that, I asked it to reduce those features for a stripped down v1 prototype/proof of concept, and it said:

  1. Real-time image preview: Display the generated image based on the user’s current position in the latent space, updating in real-time as the user navigates.
  2. Dimension assignment: Allow users to assign different dimensions of the latent space to specific inputs or controls, enabling them to easily switch between controlling various dimensions.
  3. Save images: Enable users to save interesting images they discover during exploration for future reference or use.
  4. Local exploration mode: Provide a focused exploration mode that allows users to navigate the immediate neighborhood around a specific point in the latent space.
  5. Global exploration mode: Offer a more free-form exploration mode that enables users to explore the entire latent space without restrictions.
  6. Adjustable proximity for local exploration: Provide a slider or input field to adjust the proximity of adjacent neighborhoods during local exploration.
  7. Navigation history: Provide access to the user’s navigation history, allowing them to quickly return to previous locations in the latent space.

Wrap-Up

There’s a lot to digest in all of this of course. Apart from just the interesting aspects of the technical challenges described in this product design, the thing I’m most excited about is just how far you can take ChatGPT in a product design & planning session. The answer, at least in v4, is really really damn far.

Then, if it’s something that is amenable to the types of visual representations that Midjourney is capable of, this starts to become a pretty powerful set of tools. I suspect once they publicly release the image description tools for GPT-4, you’d also gain other possible vectors of insight into this design iteration process.

All told, this direction of a controller (hardware + software) specifically for Midjourney, but adaptable to any high-dimensional space or model, is a really compelling one for me as an artist making heavy use of the tools in the somewhat primitive state we see them today.

I’m not going to guarantee that I’ll be the one who builds it, but perhaps once Github Copilot X is released, I’ll be that much closer to actually being able to build some aspects of this down the road. If nothing else, I’m hoping these blog posts exploring these topics can inspire other innovations in the field. Exciting stuff!

Notes on Occupy AI

Occupy AI is the 91st volume in the AI lore series by Lost Books, a Canadian publisher specializing in AI-assisted art books.

This volume uses ChatGPT v4 with Midjourney v5 to imagine what might happen if a radical direct democracy group occupied the headquarters of prominent AI labs, demanding transparent direct control of the technologies by those most impacted by them, ordinary people. The book grew out of recent blog posts exploring related issues.

It’s fun to take this exploration out of the realm of pure hypotheticals, and into a sort of fictional sandbox which can act as a test bed or proving grounds for some of these ideas and societal conflicts. Fiction is great for this because it’s so much faster than having to wait around for reality to catch up, and nobody gets hurt from the exercise.

Unduly concentrating power

Not to pick on OpenAI, but I think their position requires that they can withstand legitimate criticism. I keep coming back again and again to their charter, which gives a shape to what their organization is supposed to aim for. This part just echoes in my brain:

We commit to use any influence we obtain over AGI’s deployment to ensure it is used for the benefit of all, and to avoid enabling uses of AI or AGI that harm humanity or unduly concentrate power.

Unduly concentrating power.

To quote a ZDNet article on the release of GPT-4:

In the GPT-4 technical report published Tuesday, alongside the blog post by OpenAI, the firm states that it is refraining from offering technical details because of competitive and safety considerations. 

“Given both the competitive landscape and the safety implications of large-scale models like GPT-4,” it writes, “This report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method, or similar.”

Let’s put aside the safety question for a minute, because I personally don’t believe that any closed source single-owner AI system can ever even truly be termed “safe” in a meaningful way.

Instead, let’s just focus on that mention of “competitive landscape.”

Closing not just your source, but even technical explanations of your model, because of competition is literally an example of unduly concentrating power.

If their goal is moving towards AGI which benefits all of humanity, they’re already on the wrong path. Can they get back on the right one, one in line with their own charter?

Democratic Control of AI

Something I failed to get into in my last post about how do we know when it’s time to socialize AI

Well, actually I’ve written a bit about these problems of consent & coercion w/r/t to the apparatus of the state in the past, so I won’t rehash them all again. I’ll just randomly toss out links and let the future figure it out:

Like, there are a lot of precursor conditions that would need to be met before democratic control over AI would actually even be a “good thing.”

We take it for granted in these discussions that (1) we actually have true democracy (we don’t), (2) that the control mechanisms we’ve attached to democracy are actually good…

Like I hate to break it to you, but have you looked at some of these so-called lawmakers as people? They are… not ideal specimens of humanity that we should all strive to mimic.

When people say they want democratic control of AI, do they mean by Ted Cruz? Cause I think he would be happy to speed us along toward that future, but I don’t think it’s one we actually really want…

Secondly (fourthly? fifthly? I lost count), coupling the power of decision-making to the yoke of popularity necessarily attracts power-grabbers. It does not necessarily attract people who are good decision-makers, let alone lawmakers.

So, if we really want democratic control of AI, I think we need to spend some more time on the core problems of democracy.

What if, for example, we uncoupled the popularity contest part that politicians like from the lawmaking part. So you could “be” a politician, and seek status, and gain influence, etc. Be a voice. But you would have effectively no decision-making power. Maybe once you held office, you wouldn’t even be allowed to vote anymore. I don’t know. I’m just throwing out wild ideas here. It just seems like putting popularity at the helm of the law is a really big design flaw we’re just willfully ignoring.

And then of course, not holding the so-called stewards of the law to even the most obvious basic ethical principles… god don’t get me started.

Also, there’s a who’s-to-say argument here: with something so pivotal as the mass scale introduction of AI into global society, how do we know that a bunch of people voting directly or voting for someone else to do their voting for them would actually lead us towards “good” outcomes? While some of immediate the harms are obvious and predictable, the next order effects and the ones after that become more and more chaotic.

Arguably, I think we can still say the systems should be accountable and responsive to the people they affect. That seems like a given. A corporate capitalist structure does not optimize for that as an outcome, so unless we choose to change that basic structure to optimize for other outcomes, we’ll get what we get.

This direction seems to slip into a sort of technocratic corporatist fantasy, where corporations might argue they have greater expertise to run these complex systems than simple ordinary dumb citizens whose lives will be impacted from every direction by them.

It’s a conundrum that we will have to puzzle our way through one way or another without being able to see the bigger picture we’re assembling until it’s all finished.

Page 59 of 177

Powered by WordPress & Theme by Anders Norén