ChatGPT-4 outperforms human psychologists in test of social intelligence, study finds

(Photo credit: Adobe Stock)

A new study published in Frontiers in Psychology investigates how AI compares to human psychologists in understanding and responding to human emotions and needs during counseling. The study specifically examined large language models, such as ChatGPT-4, Google Bard, and Bing, assessing their social intelligence — a critical skill in psychotherapy.

ChatGPT-4 outperformed all participating psychologists, while Bing surpassed more than half of them. However, Google Bard’s performance was comparable only to psychologists seeking bachelor’s degrees and was significantly outstripped by doctoral students.

Large language models (LLMs) are advanced artificial intelligence systems designed to understand and generate human-like text by processing vast amounts of written data. These models are trained on diverse internet text to capture nuances in language, context, and syntax.

Through techniques known as deep learning, particularly using structures called neural networks, LLMs can perform a variety of tasks such as answering questions, translating languages, summarizing long articles, and even engaging in conversation that feels strikingly human.

While previous research has shown that LLMs can diagnose and help manage mental health conditions, there was a gap in understanding specifically how these models perform in social contexts, particularly against human psychologists who are skilled in navigating complex emotional interactions.

“The use of artificial intelligence models in counseling and psychotherapy represents a major challenge for psychologists, due to concern that it may take their place in these important tasks,” said study author Fahmi Hassan Fadhel, an associate professor of clinical psychology at Qatar University. “The superiority of artificial intelligence in the areas of perceiving and understanding people’s emotions may mean that it will perhaps be more useful than a human psychotherapist, which is a very concerning issue.”

The study included 180 male psychologists from King Khalid University in Saudi Arabia, divided based on their educational status into bachelor’s and doctoral students. The AI participants included some of the most advanced LLMs available: OpenAI’s ChatGPT-4, Google Bard, and Microsoft Bing.

Each participant, both human and AI, was asked to respond individually to 64 scenarios presented in the Social Intelligence Scale. This scale was chosen because it is well-established and offers a reliable measure of the social skills that are crucial in psychotherapy. The responses were collected and scored according to predefined criteria.

The items were designed to measure two primary dimensions of social intelligence: the soundness of judgment of human behavior and the ability to act wisely in social situations. The soundness of judgment involves understanding social experiences through observation of human behavior, while the ability to act pertains to analyzing social problems and choosing appropriate solutions.

The results indicated a significant variance in the performance of different AI models and human psychologists, suggesting that some AI systems have advanced to a point where they can outperform human professionals in specific aspects of social intelligence.

Among the AI models evaluated, ChatGPT-4 stood out by demonstrating the highest level of social intelligence. It scored 59 out of 64 on the Social Intelligence Scale, effectively surpassing the performance of all human psychologists in the study. The average social intelligence scores were 39.19 for bachelor’s students and 46.73 for doctoral students.

On the other hand, Bing also performed well, scoring 48 out of 64. This score indicated that Bing outperformed 90% of the bachelor’s students and was on par with 50% of the doctoral students.

In contrast, Google Bard exhibited a lower level of social intelligence in this study. It scored 40 out of 64, which positioned it roughly equivalent to the bachelor-level psychologists but significantly below doctoral students.

The findings serve as a benchmark for future development of AI systems intended for psychotherapy and counseling. Knowing that AI can match or even exceed human performance in social intelligence tasks provides a strong foundation for further integrating these technologies into mental health services.

“The study provides a quick overview of the rapid developments in artificial intelligence,” Fadhel told PsyPost. “It’s a bright outlook for the near future.”

However, the study also raises important questions about training, development, and the ethical use of AI in sensitive areas like mental health, where the ability to empathize and form therapeutic relationships is traditionally viewed as uniquely human.

“Perhaps the biggest caveats will relate to the capabilities of artificial intelligence in the future to understand and analyze human feelings and make decisions based on that,” Fadhel said. “We do not know where developments in this field are headed. To date, the controls imposed on artificial intelligence developers are still at their lowest levels, according to our knowledge.”

The study, “Artificial intelligence and social intelligence: preliminary comparison study between AI models and psychologists,” was authored by Nabil Saleh Sufyan, Fahmi H. Fadhel, Saleh Safeer Alkhathami, and Jubran Y. A. Mukhadi.