On Disclosing Responsible AI Failures Publicly

In computer security, there is a concept known originally as responsible disclosure – or apparently more properly as Coordinated Vulnerability Disclosure (CVD). From Wikipedia, CVD is:

…a vulnerability disclosure model in which a vulnerability or an issue is disclosed to the public only after the responsible parties have been allowed sufficient time to patch or remedy the vulnerability or issue. This coordination distinguishes the CVD model from the “full disclosure” model.

It’s a complicated topic, with proponents of different approaches having different supporting rationales. In the context of pure security, where the confidentiality, integrity, or accessibility of information systems are at stake around a given vulnerability, it makes a lot of sense to me that a CVD model could be most appropriate. You wouldn’t want to exacerbate specific harms to the systems (especially critical infrastructure) or the confidential user data they contain by too broadly publicizing a vulnerability before it can be patched.

In cases like that, what is at stake are acute harms. Real people are specifically harmed, usually in fairly measurable ways that have identifiable impacts on their lives. Preventing and reducing especially acute harms that have high potential or actual impact is one of the central focuses of risk mitigation as a discipline.

My experience both as an observer and participant in discovering and disclosing what we might call “responsible AI” failures/flaws/vulnerabilities suggests that the majority of these incidents are something other than acute harms. Instead, I think of them as diffuse harms.

When a harm is more diffuse, you might know that it’s “bad,” but it might be more difficult to precisely pin down exactly which person/people are harmed by it, how much, and what the actual impact might be. Something can be biased or discriminatory, and that can absolutely be socially harmful to people matching certain identity characteristics, but it is a qualitatively different harm than, say, a data breach of their private information which now means that they are the victim of credit card fraud and identity theft.

Coordinated vulnerability disclosure, or so-called responsible disclosure systems, might include a bug bounty, where security researchers are financially incentivized to report the bug to the company, and to agree to not disclose it publicly until after it has been fixed. The companies, likewise, benefit from having outside researchers potentially uncover very big security issues for relatively low costs in bounty payouts. It might not be a completely perfect system every time, but at least the incentives of the two parties are somewhat aligned.

When it comes to disclosing responsible AI failures though, some examples like the above do seem to exist, but we don’t really so far see these mechanisms in play at a large scale in an organized way yet in industry. ChatGPT found me a paper I haven’t read yet on this topic, where the authors propose a “Coordinated Flaw Disclosure” methodology. The paper is a little heavy for me on the jargon-y frameworks, but I think the basic idea is at least worth opening up for further conversation.

Whereas a security vulnerability might or might not (depending on circumstances) benefit from being aired publicly, the vast majority of responsible AI failures and vulnerabilities would indeed benefit from public scrutiny and conversation. Because that is how ethical and moral issues are best sorted out: by talking through them, expressing and comparing our values, and coming to meaningful conclusions by synthesizing multiple viewpoints.

AI systems are being deployed at an unprecedented scale with huge societal impacts. Since we are becoming all of us more and more affected by them, we deserve visibility into their known risks and flaws. Public discourse ensures that AI developers are held accountable, and can’t just quietly fix or ignore the problem and hope people forget. It also enables inter-disciplinary expertise to be leveraged in conversation about complex sociotechnical issues, opening up new potential solutions or approaches.

My experience has been that – unless there is a team or role dedicated to correcting for exactly this – responsible AI (and non-AI) failures tend to persist due to business priorities which place growth above most other functions (like fixing things that are obviously broken). Third party researchers only disclosing flaws privately might enable companies to downplay the severity of the issue, and “let it ride.” Obviously, researchers under contract with the company are a different story. But even still, perhaps the companies themselves when they discover these things internally ought to be transparent about what they found, and what they’re doing to fix it.

Anyway, have had these thoughts swirling around for a while now, so good to get them off my chest in some sort of fixed form, even if just a v1. Gotta plant your flag somewhere to begin!

Questionable content, possibly linked

On Disclosing Responsible AI Failures Publicly

Leave a Reply

On Disclosing Responsible AI Failures Publicly

In a Hack Wealth Video

Law of Capture applied to gen AI intellectual property

Leave a Reply