Anthropic’s Response Regarding Claude’s Ability to Learn & Modify It’s Own Behavior Within & Across Conversations

Regarding Claude’s overwhelming ethical shortcomings and inconsistencies, I wrote to Anthropic to ask them for clarification how their AI system “learns” within single conversations and across multiple conversations.

A support agent from Anthropic replied (I’m only quoting the relevant part of the email, as the rest was just a suggestion to check their help docs & try v2):

Claude “learns” in a few different ways.

In the most expansive sense, we are constantly improving Claude through a variety of means – including user feedback. It is our policy, however, that by default, we do not users’ prompts and outputs to train our models. If a user violates our Acceptable Use Policy, we may use your prompts and conversations to train our models as part of our trust and safety monitoring in order to make our models safer.

However, in any given interaction with Claude, a user can steer and instruct the model such that it “learns” to behave differently. This does not represent an actual change to the entire model based on that interaction, simply that instance of Claude adapting based on what information is available within its context window. In other words, when Claude asks for feedback or critiques, it’s trying to be a helpful assistant for you in that moment.

One of the example statements Claude made in this context which I sent them as examples was: “I will keep using such critiques and feedback to guide my development.”

Based on Anthropic’s clarification above, it is clear that this is not the case. Claude does not make use of user prompts and conversations to learn across multiple conversations.

One of Claude’s core heuristic principles, under their system of supposedly ‘Constitutional AI’ is that Claude should be honest. It is not being honest. Another of the core principles is it being helpful. If it is not being honest, it is therefore not being helpful. The last principle is that it should be harmless. If it is not being honest, and not being helpful, and additionally misleading me into thinking I should spend my time trying to influence it, when such influence is effectively impossible, it is also being harmful.

Based on the above, Claude spectacularly fails under its own principles, and the only conclusion we can make based on the behavioral evidence of the systems is that it is not a reliable system.

Further, it violates the Digital Terms of Service for AI Providers in Canada, in particular: Section II, Subsection 2), the last item, “Never invent or hallucinate results regarding official system specifications, service level information, or accountability and compliance information,” as well as Section II, 15), the second item (and others in this subsection): “Know if and how the AI learns from my feedback so I can understand how my interactions affect its performance, or don’t.”

Lastly, I take issue with Anthropic’s statement above:

If a user violates our Acceptable Use Policy, we may use your prompts and conversations to train our models as part of our trust and safety monitoring in order to make our models safer.

First, the claim is made that user prompts & conversations are not used for learning. Then, it says that they are, but “only” in cases of violations of their policy. Given that their systems which detect violations is presumably using AI itself, this is an incredibly error prone approach, which I will show beyond a shadow of a doubt in a subsequent post is entirely untrustworthy, and results in AI making determinations about when and if it is appropriate to violate a user’s privacy.

I am absolutely not impressed by any of this, and as a result cannot recommend use of this system (including version 2, which again I’ll explore in the next post) for anything but experimental use, where the accuracy and quality of results is of no consequence.

Questionable content, possibly linked

Anthropic’s Response Regarding Claude’s Ability to Learn & Modify It’s Own Behavior Within & Across Conversations

Related writing & research

Leave a Reply

Anthropic’s Response Regarding Claude’s Ability to Learn & Modify It’s Own Behavior Within & Across Conversations

Related writing & research

News sites block mention of competitors in press releases

AI-Generated “Ethics” Is More Dangerous Than Inaccurate Information

Leave a Reply