14:49 / 03.01.2026

Artificial intelligence doesn't "feel" social situations, humans are dominant

Artificial intelligence (AI) today shows good results in performing a number of tasks, such as facial recognition, object recognition, and text writing. However, in understanding social cues, actions, and intentions between people, it still lags behind humans. New research shows that current AI models cannot clearly see people in their interpretation of moving social scenes.

According to research conducted by scientists at Johns Hopkins University, existing AI systems could not fully understand social interaction between people - who communicates with whom, who wants to do what, and the intentions behind actions. This is a serious problem for self-driving cars, assisted robots, and technologies that need to move in the same space as people in real life.

According to Leila Issik, the main author of the study, the problem lies not only in the data, but also in the question of how the AI itself "think." "For example, a self-driving car needs to understand the intention of pedestrians: in which direction it wants to go, are two people talking or preparing to cross the street. If AI needs to interact with people, it must correctly recognize human actions. This research shows that current systems are not yet capable of this," Isik said.

How was the experiment conducted?

The scientists showed the participants three-second videos. The videos show people interacting with each other, acting side by side, or acting independently. Participants rated the social interaction in the video on a five-point scale.

Subsequently, researchers tasked more than 350 AI models - language, video, and image models - with predicting human behavior and even brain activity.

The result turned out as expected... in favor of people

The participants often came to the same conclusion during the evaluation. AI models, despite their type and trained data, could not demonstrate such cohesion. Video models could not accurately depict what people were doing in the video. Even image models based on motionless frames could not reliably determine whether people interacted with each other.

Interestingly, language models showed better results in predicting human behavior, while video models showed better results in predicting neural activity in the brain. But the general picture is still clear: AI does not "feel" social dynamics.

Intelligence born in a static world

Scientists see the root of this problem in AI architecture itself. Modern neural networks are inspired by the part of the human brain that mainly processes static images. However, to understand social scenes, completely different areas of the brain are involved - those processing dynamics, actions, and context.

"Seeing the image, recognizing the object and face - this was the first step. But life is not static. We need an AI that can understand what is happening on stage and how people react. "This study shows a big black spot on this path," says Katie Garcia, one of the study's authors.

The conclusion is this: artificial intelligence is still "seeing" and not understanding many things. Man, however, can derive meaning from gestures, actions, and even silence. Therefore, for now, there is no competitor to the human brain in reading social scenes. And AI is still in line - as a student.