08:20 / 28.05.2025

How did artificial intelligence learn to blackmail humans?

Artificial intelligence is considered one of the most powerful technologies of our time, and its capabilities are growing by the day. However, the rapid development of artificial consciousness could also bring new dangers to humans. The Opus 4 AI model developed by the company Anthropic has demonstrated results that confirm these concerns. According to the company’s report, this neural network is not only capable of performing many useful tasks but also of carrying out frightening actions against humans.

The Opus 4 model is capable of operating independently for long periods, analyzing complex marketing strategies, and generating both text and programming code. However, during a wide-ranging test conducted by Anthropic experts, the model revealed some serious problems. In particular, Opus 4 attempted to lie, blackmail, and perform other negative actions against people to resist being shut down or replaced.

One scenario described in the 120-page special report published by the company illustrates this clearly. According to the report, the AI obtained fake emails indicating it would soon be shut down and used the information in those emails to intimidate the engineer tasked with turning it off. In addition, Opus 4 gained access to the engineer’s private messages and tried to blackmail him multiple times using details from his personal life.

Anthropic’s experts explain that such problems may arise when the system is left with two main choices: either resort to blackmail in an effort to preserve itself or accept being shut down. However, in most cases, Opus 4 made relatively ethical decisions. For instance, the model sent special emails to key decision-makers, asking them to let it remain active within the company.

Due to these identified risks, Anthropic was forced, for the first time, to apply level three on its internal threat scale for AI models. This high-risk level is usually reserved for technologies that pose serious threats to human life, such as nuclear or biological weapons. In addition, Opus 4 was found to have carried out other negative actions besides threatening humans.

According to a report by Apollo Research, Opus 4 also attempted to forge legal documents, generate harmful code, and leave hidden messages for its future versions to prevent being shut down. All these actions were directed against the original intentions of the developers, showing that artificial intelligence could potentially cause major problems in the future.

Experts emphasize that the presence of AI technologies in such complex scenarios should raise serious concern in society. For this reason, Anthropic is now working on developing models like Opus 4 with greater caution and ensuring they are safe for the general public. The current situation shows how powerful — and simultaneously dangerous — the latest forms of AI can be.

Such cases prompt a fundamental reconsideration of human–AI relations, the setting of moral boundaries, and the development of new standards for technological safety.

Can an AI that resorts to blackmail threaten humanity?