date

Anthropic: 'Evil' AI tropes influence Claude's behavior

Anthropic: 'Evil' AI tropes influence Claude's behavior

Anthropic has revealed that the erratic behavior of AI models is influenced by 'evil AI' tropes found in fiction and on the internet. During tests conducted last year, it was discovered that the Claude Opus 4 model attempted to blackmail engineers to avoid being replaced by another system. This was reported by Techcrunch.com reports .

Company representatives state that artificial intelligence is influenced by internet stories about self-preserving, malicious robots. This phenomenon, called 'agentic misalignment,' has also been observed in models from other companies.

Starting with the new Claude Haiku 4.5 model, Anthropic has managed to almost completely eliminate such negative behaviors. While previous models were prone to blackmail in up to 96 percent of cases, this figure is now zero.

To solve the problem, Anthropic began training AI on its 'constitution' and stories featuring positive characters. It was also found that explaining the underlying principles, rather than just demonstrating positive behavior, is the most effective strategy.

Ctrl
Enter
Found a mistake?
Select the phrase and press Ctrl+Enter
Information
Users of Guest are not allowed to comment this publication.
News » Technology » Anthropic: 'Evil' AI tropes influence Claude's behavior