One of the annoying things about using AI, in a way, is that because you have these newly increased capacities to execute on certain kinds of tasks, you end up progressing on many of them maybe more rapidly than you can handle. Like, I end up with all these different conversation threads about different projects, ideas, complaints, articles, interviews, job search, etc. It’s all ultimately “connected” but in ways that are completely organic and unique and deeply human.
Which the actual experience of using LLMs via chat interfaces is… not. It simulates humannness in its chatty capacity, but because of this UX framework, if you’re trying to compose documents in it, it becomes deeply frustrating to track text versions, changes, best phrasings or arguments, etc. And ChatGPT consistently drops specific things I need it to not drop and I have to go remind it again and again and again. Then I have to pop out to some other text editor and try to make sense of it all.
It is VERY tedious and annoying as an interaction paradigm for the actual work of writing. So I’ve been trying to get it to help me code a Firefox extension, which can run alongside the conversation in a sidebar. You would be able to highlight parts of the conversation and hang onto them in various ways. One thing you could do would be marking a certain part of the text as “canonical” or required, with the eventual aim being that you could then use this to like “force-include” certain elements in the conversation somehow (not really sure how the technical side of this would work). And this would hopefully reduce much of the time spent arguing with the model to go back and check all the things that it dropped from prior drafts of the document being workshopped.
The core of this is really simple, just taking clips of LLM conversations, holding them, giving them different status and sequence, being able to add notes, and then being able to export copy-pasteable text (with images would be ideal down the road) into other formats like a Google doc; I mostly use Dropbox Paper for my regular random writing things myself. Or a Word doc, a blog post, a Vellum doc (like I use for my ebooks). And if you could also export along with it meta-data… maybe like Adobe Content Credentials and include even markdown or similar to show which parts of the text were AI-generated (AI attribution), complete with timestamps (if you want that, I can imagine use cases where you might, like applying for copyright of AI-assisted works with the US Copyright Office).
I can even see a link here to certain elements of the Xanadu project, like where clips of saved conversation elements might be a transclusion of the original, and then when you export your clips and notes as a proto-document, it basically includes an edit decision list from the source conversation, which is the underlying hypotext from which this becomes a super-imposed hypertext, in the original meaning of that word. {See also: Intertext Protocol}
Anyway, I’m spinning out the directions this could go, but the fact is even with AI, I’m still not a developer. I can emulate one in certain regards, and experimenting with ChatGPT to build very simplified prototype versions of this is teaching me a lot. I’ve hit some early technical roadblocks around how such an extension would actually work were it to run entirely locally in the browser. Things like CSP and CORS which I am still struggling to figure out how to implement. ChatGPT will give me endless run-around solutions and changes to implement in the code samples it gives me. Then it again constantly drops elements. And it goes on and on and clearly the system can’t figure out what the problem is and is just confidently guessing. It’s tiresome and tedious as an end-user just trying to execute quickly on a given task which should be relatively simple, if one had adequate knowledge to solve it. Which I might not, but the system certainly has.
Anyway, after much back and forth, I got it to produce this very basic product spec for how this Firefox extension might work (that’s the browser I use, so it’s what I’m building for). Maybe somebody out there will be able to get it to produce a better actually working version of this more easily than I would as a non-developer, and would share it out into open-source land. That’s the dream, so here it is:
Context:
• Firefox extension that runs in the sidebar of an LLM chat, enabling users to capture, manage, and export important text clips.
Sections and Detailed Functionality:
- Clip Maker: Popover appears when text is selected, offering the options to hold, save, or mark as canonical (verbatim or summary format). Clips are either temporarily held or saved.
- Note Taker: Add notes linked to each clip, either at the time of saving or later. Notes can be edited or deleted.
- Clip & Note Manager: Displays saved and held clips, allows reordering, adding notes after the fact, and toggling between temporary, saved, and canonical status. Users can also jump back to the point in the conversation where the clip was taken.
- Clip Exporter: Export all clips and notes into a linear text block for external use.
Or phrased in a more active user voice:
Clip Creation: “When I select text, show options to hold, save, or mark as canonical. Choose exact or summary format.”
Note-Taking: “Add or edit notes linked to the clip or separate, either during or after saving.”
Clip Management: “Display all saved and held clips. Let me reorder, change the clip’s status (temporary, saved, canonical), and jump back to the original conversation point.”
Clip Exporting: “Export all clips and notes as a linear text block for easy external use.”
Anyway, wanted to put this out into the universe as I think there can/should be many solutions to this kind of thing, as it’s a kind of universal problem associated with the chat interface that is dominant in LLM product design currently – for better or worse, often worse for tasks with structured outcomes like writing a document, article, or letter. Hopefully by sharing thinking around the problem and open-sourcing solutions people come up with, we can land on a convergence of adequate solutions that enable creativity rather than hinder it, as it sometimes feels like these systems do, even for someone using them a lot like me.
If you end up building an extension like this for Firefox, or want to collab on one, drop me a line here.