Anthropic says some Claude models can now end ‘harmful or abusive’ conversations 

Anthropic has announced new capabilities that will allow some of its newest, largest models to end conversations in what the company describes as “rare, extreme cases of persistently harmful or abusive user interactions.” Strikingly, Anthropic says it’s doing this not to protect the human user, but rather the AI model itself. To be clear, the…

Read More

A safety institute advised against releasing an early version of Anthropic’s Claude Opus 4 AI model

A third-party research institute that Anthropic partnered with to test one of its new flagship AI models, Claude Opus 4, recommended against deploying an early version of the model due to its tendency to “scheme” and deceive. According to a safety report Anthropic published Thursday, the institute, Apollo Research, conducted tests to see in which…

Read More

Anthropic publishes the ‘system prompts’ that make Claude tick

Generative AI models aren’t actually humanlike. They have no intelligence or personality — they’re simply statistical systems predicting the likeliest next words in a sentence. But like interns at a tyrannical workplace, they do follow instructions without complaint — including initial “system prompts” that prime the models with their basic qualities, and what they should…

Read More