In my survey of the field, there are surprisingly few resources discussing the concept of AI attribution. One of the only clear discussions I’ve found of it specifically comes from an article by Tom Barrett.
It covers mainly a somewhat common trope around the idea of information coming with “nutritional facts” labeling, which has been circulating for years in fact checking & credibility indicator circles. The new element it connects them to however is machine learning model cards, which as a concept appears to have been created and largely colonized primarily by Google, though they claim it should be a shared standard. (Other sites like Huggingface & Kaggle do use them.)
Barrett proposes making a miniaturized version of model cards, which could presumably somehow travel along with content on the web that was generated by a given model:
In mini-model cards, we can bring together the attribution — hat tips to the people and businesses who created the technology — and the provenance of the technology — the story behind the tech, links to ethics frameworks and technical specs.
It’s a good idea in its most basic form, but like C2PA, a great depends on how industry implements “travels along with” for metadata that is meant to be attached to a given artifact on the web. With the amount of copy pasting that goes on in social media especially, how can we ever hope to have this metadata accurately travel between platforms?
In Cp2A they seem to partly address that with unique hashes made from the original artifact’s actual contents (as well as of associated claims about it), but I have not drilled won yet onto how they propose any of this plays out in the chaotic space of web platforms. And I haven’t yet seen anybody mention how you manage or pay for all that hash scanning which would presumably become a key component of any scheme like this? (That is, looking up items which do not have provenance information, in order to see if a hash exists already elsewhere)
Anyway, putting that part aside for now… Barrett in his piece, like me, comes to the conclusion that this all may be difficult to pull off. But that doesn’t make it not worth doing, because authors need specific methods to be able to transmit trustworthy signals about the content they produce.
As I mentioned elsewhere, there’s also the end user problem with all of this kind of effort at standardization and revelation of metadata that doomscrollers on their toilets may not care all that much, or think deeply about the implications of something be a 12% versus a 15% AI-assisted piece of content, or having different ethical twists about the underlying model.
What are we asking users to do if we give them this information? Be reasonable? Make good choices about what you consume? Gain greater media literacy? Those are all noble goals, but I wonder if they aren’t, A) going to be too little too late, and B) a bit paternalistic, even if well intention? Whose definition of “reasonable” should people apply? How are we defining “good choices?”
Again, not reasons this work shouldn’t be done (it should, but honestly about the bigger problems). Instea, I want to highlight that these are not “pure” technical problems; they are sociotechnical problems. They are inseparable from human impact, human nature, and the way they collide with the technological mega-complex we live in now. We can’t just be like “here’s some metadata” and fool ourselves into thinking that the ‘fix’ ends there, and that if users don’t adopt it in a meaningful way that it is somehow their fault…