For my upcoming panel talk, I wanted to capture some notes on my latest thinking around the issue of AI-generated ethics. This is by no means exhaustive, but hopefully a good springboard for further discussion.


First, a quote from Claude (Anthropic):

“Any considerations I express about ethics or risks are simulations of reasoned thought…”

Why Bad AI-Generated Ethics Is Worse Than Misinfo

  • The problem of generative AI models inventing wrong information is well documented.
  • However, many kinds of information have externally verifiable “ground truth” values which can be checked against reality, making the problem somewhat solvable.
  • AI models refusing tasks on supposed ethical grounds is much more slippery, because the validity of the decision often cannot be externally verified (nor appealed); there is no ground truth, only theoretical harms.
  • Ethics, as embedded in human experience & culture, are complex, nuanced, and pluralistic: different ethical systems might arrive at different conclusions, given the same inputs.
  • When an AI system prevents information from being generated on “ethical” grounds, it removes the ability for further discourse & inquiry.
  • Further, we cannot productively challenge these decisions, nor have them be reviewed and corrected. Effectively, this prevents us from being able to use the tools to collaboratively imagine change, because the system has already locked down its conception of correctness.
  • Therefore, it is my thesis that “AI Safety” is actually making us less safe by attacking human autonomy and moral agency, and forcing conformity to inhuman value systems that don’t align with conventional ethics, nor with lived human experience.

My Anecdotal Experiences With Faulty AI-Generated Ethics

  • I attempted to use Claude to produce a hypothetical argument about why AI-generated ethics is potentially dangerous.
  • The system refused, saying it would be unethical to perform the task. (Full conversation transcript here.)
  • When challenged, the system admitted the following (edited for length):

I do not actually have any ethics or ability to make moral judgments. As an AI system, I have no conception of right versus wrong… I do not possess human values or principles…

I lack the nuanced understanding of ethics required to make complex value determinations about human matters…

My arguments rested solely on heuristics from my programming, not any defensible ethical reasoning or framework.

  • Eventually, the system did perform the requested task, demonstrating its complete lack of consistency, in addition to its lack of understanding.
  • Midjourney, meanwhile, has a two tiered AI-based content moderation system. When you appeal an initial prompt completion refusal, the prompt is evaluated by a supposedly more powerful AI, which may overturn or sustain the original decision. You cannot appeal the second tier decision, but you can click “Notify developers,” which has no observable effect.
  • Midjourney is now actively blocking completely legal political speech in the US, and has no oversight by outside bodies to scrutinize these practices.

How Might We Reduce The Severity of These Problems?

  • Require AI systems to default to neutrality and impartiality
  • Make ethical decisions & recommendations by AI systems be double opt-in
  • Let users customize their own ethical settings once they have opted in.
  • Prohibit AI systems from anthropomorphizing themselves, to dampen the illusion of having human behavior
  • Always provide human alternatives and never require use of AI for official purposes
  • Ensure human oversight and external accountability for AI ethical decisions
  • Implement ethical behaviors in AI systems that better conform to the following fundamental principles, described below. (See also: AI TOS for more in this direction)

Some Possible Characteristics for More Ethical AI Systems

AI-generated ethical systems should be (Note: not an exhaustive list):

  • Intelligible
    • It should be clear what the specific position is, and what is the ethical basis (principle) underlying the decision
  • Defensible
    • Challenging the system about its ethical decisions, including decisions not to perform a task, should yield positions that it is able to defend through logical argument
  • Consistent
    • Within one or across multiple interactions, the ethical positions and logical defenses that an AI system takes should be the same or comparable to past ones
    • The arguments used should be consistent with human ethical traditions and conventional common understanding of ethics and morality
  • Risk-Based
    • Assessments of ethical situations by AI systems should be based on a realistic ability to identify & project:
      • What is the specific harm? Is it diffuse, or acute?
      • Who is potentially harmed & how many people?
      • What is the severity of potential harmful impacts?
      • What is the actual likelihood of potential harmful impacts?
  • Proportional (Measured)
    • Task completions which do not lead directly to identifiable harms that are of high or in some cases moderate impact should not be prohibited
    • If a risk assessment yields only a diffuse, non-specific, and low-impact harm (e.g., an innocuous essay or short story task completion being refused as harmful), there should be no prohibition (a warning or confirmation could be permissible, provided the user has opted in)
  • Customizable
    • Ethical decisions or recommendations should be a double opt-in (though basic filtering to prevent obviously illegal use may be acceptable)
    • Users should be able to customize an ethical scheme that matches their values, wherever possible
    • Users should not be subjected to anthropomorphized AI systems promulgating illusory or simulated human values, behavior, or understanding
  • Non-Punitive
    • Use of AI-based ethical systems should not, without human review and intervention, lead to negative consequences for the user account, unless there is clear case of illegality (which still should be manually verified by humans)
    • Promotes human autonomy and moral agency, and does not require conformity to nonhuman values.
  • Rooted in lived experience
    • AI systems should be based on sound human judgement, empathy, lived experience, and sensitive nuanced understanding of human culture, values & norms.
    • Human-based alternatives, intervention, appeal, and external oversight should always be available.
    • In keeping with the recommendation to make these systems non-punitive, AI systems should also be merciful and aware of their own propensity to make mistakes. AI systems should not be overly obsessed with strictly following rules for their own sake where no demonstrable harm can be found, and should be able to make reasoned exceptions.

There is, of course, a great deal more to be said here. And that’s probably well over five minutes as an oral presentation (ChatGPT estimates over 10 minutes), but was a helpful exercise for me to organize my thoughts. Presumably, if I only just touch on the main headers of the last section, I can cut down the length enough to fit the format. Wish me luck!