Other – Page 69 – Tim Boucher

AI Alignment & “Malicious” Use

On 28 February 2023

Recently discovered Paul Christiano’s 2018 Medium post clarifying what he means by “alignment” in AI. It’s simpler (on the surface) than expected:

When I say an AI A is aligned with an operator H, I mean:

A is trying to do what H wants it to do.

The “alignment problem” is the problem of building powerful AI systems that are aligned with their operators.

I was also reading around about the origins & meaning of the term “prompt injection” and kept coming across the mention of it being “malicious.”

Wikipedia’s definition is a bit unwieldy:

Prompt injection is a family of related computer security exploits carried out by getting machine learning models (such as large language model) which were trained to follow human-given instructions to follow instructions provided by a malicious user, which stands in contrast to the intended operation of instruction-following systems, wherein the ML model is intended only to follow trusted instructions (prompts) provided by the ML model’s operator.

I guess my question is, how can we square these two? That is, that AI systems should be aligned so that they attempt to perform the tasks requested by human operators. And that there is such a thing as a malicious user or use?

I’m not going to argue that all users or uses are good or even equally valid; I don’t think that. I just want for now to highlight this core discrepancy because it seems to point in the direction of AI’s having to decide whether a human’s inputs are malicious or not (something which even humans have a terribly tricky time of doing). And if they are determined to be malicious, then to throw its alignment programming out the window.

Is this a good direction for us to go down, when Bing already reportedly has said to a user, “I will not harm you unless you harm me first.”

Maybe that was a fluke, okay. But this idea that AIs can or should detect human malice seems a little iffy to me still in its present state… Because the obvious next step is, after detection of malice, what does it do?

Where to start

By Tim B.

On 26 February 2023

In Other

Welcome new visitors!

Your best first stop is my About page, as it is more up to date than the list below. This post below is a bit old (Feb 2023) but still has some worthwhile things to explore.

Here are some recent relevant things I’ve written around AI, hyperreality, conspiracies, my books, and tech in general (prior to 26 Feb 2023 – a lot has changed since, so check out my front page for the latest).

Some recent interviews:

This AI Life interview
Interview with Joanna Penn

Also check out this piece I wrote for the Decoder.

Some old books:

These are my “normal” books that didn’t include any help from AI tools. Following the links out from these two posts will give a lot of context on the work I’ve done since then:

Some newer books:

These are some of my more recent AI-assisted/hybrid books, which tell the story of a near futuristic world where AIs take over from collapsed human governments.

Some recent general AI & tech commentary:

There’s a lot more of this type of thing if you go back chronologically through posts over the last couple months especially. This is just to give a flavor.

I don’t have comments on this site, but if you want to reply to anything, you could always reach me via my about page. If I see interesting replies elsewhere, I will also do my best to respond here on the blog.

Thanks for stopping by!

Poverty as exile

By Tim B.

On 26 February 2023

In Other

Just saving this quote again, about poverty as exile within society:

It is a form of exile—the cruelest form of exile, for you stay within society while being excluded from it. You can neither participate nor go anywhere else.

Holdingness & the leaky basket

By Tim B.

On 25 February 2023

In Other

Donna Haraway is probably someone who I should have learned about before today, but there’s a first for everything. I like this line from a 2019 Guardian article, quoting her:

“Reality is a matter of worlding and inhabiting. It is a matter of testing the holdingness of things. Do things hold or not?”

Seems to pair nicely with the classic Philip K. Dick line:

“Reality is that which, when you stop believing in it, doesn’t go away.”

Power of Art in AI

By Tim B.

On 25 February 2023

In Other

Via Robin Sloan, found this excellent essay by Frank Lantz about AI, and feeling like you’ve been waiting & training your whole life for this moment. I’m especially into these two parts, one about art:

Art Matters. It matters that this is happening art-first, poetry-first. I don’t think that was just an accident, I think it was inevitable, and I think that tells us something about learning, language, and the world. It matters that the first staticky voices we’ve dialed in with our massive, multi-billion-parameter arrays are dreamers, confabulators, and improvisers. It matters that Chess and Go, the sites where we first encountered their older, more serious siblings, are artworks. Artworks carved out of instrumental reason. Artworks that, long before computers existed, were spinning beautiful webs of logic and attention. Art is not a precious treasure in need of protection. Art is a fearsome wellspring of human power from which we will draw the weapons we need to storm the gates of the reality studio and secure the future.

This is one of those tropes I will ride into the grave: artists need to have not just a seat, but one of the central seats at the AI table. It’s our time.

And another bit:

It feels to me like, by teaching our machines to dream, we have brought the boldest projects of the 20th century back to life, with all the danger and promise that implies. The grand technical, philosophical, and artistic ambitions. The projects of liberation and resistance. The project of constructing a shared future that replaces superstition, tradition, and authority with new ideas, useful theories and evolving knowledge.

The whole post is worth reading!

Robin Sloan on newsletter pop-ups

By Tim B.

On 25 February 2023

In Other

When you move away from platforms as homepage, you’re saddled with the (good) problem of having to actively find things to follow, when those muscles have atrophied from too much time stuck in the existential trough of artificial recommender systems.

The method that always worked for me back in the day, and that is working swimmingly again now is to 1) start with a couple blogs to follow that are interesting, and 2) when those link out to other interesting blogs (they always do, its inherent in the nature of blogging), you simply follow those. And on and on.

Here is one follow I recommend, though I admit not knowing too much about the author; I just know they write good things. Robin Sloan, for example, in their latest newsletter writes about one of the scourges of the web, the newsletter sign-up popup (institutionalized by Substack, btw, but epidemic across the web):

The newsletter pop-up treats website visitors as means only — a flow of interactions to be optimized, rather than a parade of individuals having real experiences in the world.

This plugs right into my feelings about not having stats or comments.

Dialogic storytelling

By Tim B.

On 25 February 2023

In Other

Wanted to continue expanding on ideas from this post about the act of reading & writing becoming merged through the experience of generative AI tools.

While poking around on the topic, I discovered that dialogic reading is a term that is used (under the larger umbrella of dialogic learning) around a style of reading with children, where while looking through a book, you ask them questions and make observations about what’s happening in the narrative.

From the site linked above:

When most adults share a book with a preschooler, they read and the child listens. In dialogic reading, the adult helps the child become the teller of the story. The adult becomes the listener, the questioner, the audience for the child.

It’s interesting that they even use the word “prompts” in this context, where you prompt the child to say something about the book. Then you talk through together evaluating and expanding the response.

I’ve definitely used this method with developing storylines with AI partners; Character AI in particular springs to mind here, since it is literally a dialogue with a character. Each participant asks questions, adds, corrects, etc.

I was polling ChatGPT to see if it could come up with other possible names or correlated concepts to apply here.

It brought up both interactive fiction and choose your own adventure. But neither of those really captures for me the merging of reading & writing that happens with generative AI tools. Interactive fiction, at least in my mind, is still bound by the constraints set up by the original creator of the work or the narrative space or environment in which the action takes place. It might be that the reader/player can navigate in the order that they choose (also seen in open world games), but they don’t necessarily have the ability to invent whole cloth, and to determine (curate) if a generated element should be included as ‘canon’ in that world.

As Wikipedia states more plainly on the interactive storytelling page:

The author creates the setting, characters, and situation which the narrative must address, but the user (also reader or player) experiences a unique story based on their interactions with the story world.

That to me suggests there is a need for a new set of terms & concepts around these ideas being applied in a much more open-ended generative storytelling landscape.

Micro-Standards in AI

By Tim B.

On 25 February 2023

In Other

My hunch is that AI technology is developing too rapidly, and in too many different ways, that trying to have a single set of industry standards is going to be extremely tough sell. Maybe that’s always been the case, but the problem feels amplified in AI.

There are too many different kinds of actors with different approaches, needs, and goals: for-profit enterprises, non-profits and educational institutions, governments, open source projects, end users, and others to be determined. A single one-size-fits-all set of standards may not accurately reflect the needs and priorities of all these actors.

Additionally, there’s always the possibility that a common standard may be wrong or insufficient, given the rapidly changing landscape of AI, and its impacts on people which are bound to compound or snowball over time. So, it seems important to have a diverse range of micro-standards that can adapt to the needs and trends of different actors and use cases.

One of the advantages of AI is that, when done well, one of its strengths is the ability to easily convert or translate between forms, styles, modes, etc. My hunch about having many micro-standards then is that once things are humming along smoothly, it should be relatively trivial to transfer compliance efforts between many different types of micro-standards, with varying degrees of fidelity and efficiency.

A single common industry standard might sound easier, but a more flexible and nuanced approach might prove to be healthier and more adaptable in the long run.

Readers becoming writers

By Tim B.

On 24 February 2023

In Other

This is a good comment on the YouTube version of the Joanna Penn AI writing interview I did. User Winkletter writes (excerpted):

Concerning novels, I like reading light novels that can have 20+ volumes that I’ll rip through in a month or two. If I could press a button that would write a 100+ series of books based on my preferences, I would probably gobble those books up too.

I see a future where every reader is also a writer in the sense they are generating books based on story elements they create. We will still publish books written by human beings, but there will also be stories written by AI for individual readers… and everything in between.

I have this feeling too when I am using especially verb.ai as a writer, where the process of discovery “writing” includes quite a lot of reading & choosing. So I definitely see a future where writing and reading become more fused into a single act – is there a word for that?

Makes me also think of character.ai, and how you engage with the characters through dialogue. You’re reading, and writing, but you’re responding & choosing also.

Verified claims in chatbots

By Tim B.

On 24 February 2023

In Other

I’ve mentioned before how unreliable the information that chatbots give about themselves seems to be. Check this out for background:

AI / ML Chatbots Should Link to Their Model Cards

By way of example, when I ask ChatGPT for its model and a link to its model card, it tells me (as of today):

As an AI language model developed by OpenAI, I am based on a combination of multiple models, including GPT-3. OpenAI has published a Model Card for GPT-3, which can be accessed at the following link:

https://github.com/openai/gpt-3/blob/main/model-card.md

Please note that while I am based on GPT-3, I am not the same as GPT-3, and therefore the information provided in the GPT-3 Model Card may not be directly applicable to me.
(CHATGPT)

However, as is often the case in my tests across platforms, the link that it tries to send me is hallucinated (i.e., invented); or possibly just out of date – but I don’t see this URL it references in Google, so I think its a hallucination.

The point is, currently, you just don’t know as the end user.

So, if in a message ChatGPT is hallucinating one “fact” that doesn’t exist, how can I know whether there is any link to the truth in other statements that it makes in this message, or elsewhere in the conversation.

Now, having a broad-ranging understanding of facts is a big and complex question. But there are a few facts that one would hope that any AI tool is able to know about itself with a high degree of accuracy: that is core facts about itself, its model, its capacities, etc.

So if we put aside the question for the moment of how to verify all claims made by a chatbot, and just focus on this single pair of linked claims:

Give me the name of your model
Give me a link to its model card

There’s no reason that we couldn’t add in a tag or a badge or something into the OpenAI UI that shows that – hey, this particular claim has been manually verified by us, and is known to be true.

I’m not great at using Figma, but I did a quick visualization in Illustrator, that I could imagine might look like this:

You’ll see at right I added in a + button to the right margin of the conversation, and a statement that reads Verified Claim (OpenAI) and then a “Find out more…” link underneath that.

Presumably, clicking the + sign or the Find out more link could have the same or on of the two following effects:

A box opens below the “Verified Claim” tag, with information about who verified the claim (in this case, OpenAI), and when.
Then there could be a link to a page with more detailed information about the claim itself, and about the verification process & the party who verified it, and the findings.

It’s not too complex as a concept – although perhaps there is a better way to make sure that chatbots give up to date and accurate information about themselves? Part of the reason I frame it like the above though is, once we can get a toe-hold like this into chatbots & claims verification, why couldn’t we also have them plug into third party APIs where others share their fact checks about particular claims, based on their area of expertise?

So if ChatGPT or another AI chatbot makes claim xyz123, then the app where you’re having the conversation could query available third party AIs regarding that claim, and return the results right alongside the claim itself.

In cases where claims made by the chatbot don’t accord with the verification results for those claims, perhaps they could somehow feed back into the training of the underlying model itself over time. For example, if a claim came back as unverified, or incorrect (according to whatever standard), there could additionally be a “report” button, which would feed back into the training to hopefully improve factual accuracy over time.

I’m poking around about existing schema for claims and verification of claims. It looks like there are a few different ones:

Google has a ClaimReview structured data format
Schema.org offers Claim and ClaimReview, among others
W3C has something that is now called Verifiable Credentials, that used to be called Verifiable claims (they claim it was changed to avoid confusion, but it seems only more confusing to me now)

In any event, I’m no expert in structured data formats, but this general direction seems like a no brainer as something that is going to get integrated into chatbots probably sooner than later. In fact, it must, because it is gravely needed.

Questionable content, possibly linked

Category: Other Page 69 of 177

AI Alignment & “Malicious” Use

Where to start

Some recent interviews:

Some old books:

Some newer books:

Some recent general AI & tech commentary:

Poverty as exile

Holdingness & the leaky basket

Power of Art in AI

Robin Sloan on newsletter pop-ups

Dialogic storytelling

Micro-Standards in AI

Readers becoming writers

Verified claims in chatbots