Emotional intelligence: Surprise! AI can't read emotions

Reading an emotion in context is hard, yet it's fundamental at work. To communicate effectively, it helps to understand what the other person is feeling in real time.

In 2023, a team of researchers had an original idea: put GPT-4 through an emotional intelligence test. The result: an emotional quotient of 117, ahead of 89% of the humans tested. The press got excited — so AI had empathy, and more of it than humans. The following year, another team ran the exercise again, but with trickier scenarios: mixed emotions, ambiguous situations, hidden causes. The same GPT-4 dropped back below the average human. The model hadn't changed by a single byte. What had changed was the subtlety of the questions.

The idea isn't new. Psychologists Peter Salovey and John Mayer defined emotional intelligence back in 1990 as the ability to perceive, understand, and regulate emotions — your own and other people's.

Emotional understanding, the key to managerial roles today.

Emotional understanding is reading the real emotion behind what a person lets you see. You do it every day, often without thinking about it:

a colleague who answers "no, no, everything's fine" in a voice that says the opposite
an out-of-place laugh that may be hiding emotional distress
a self-assured candidate whose voice tightens on one particular subject

In 2026, it becomes decisive because AI is finding its way into every human interaction: it summarizes your meetings, analyzes your calls, and drafts your replies. But AI doesn't understand certain situations, and it's critical that you catch it. A candidate's value lives where they outperform it, and a sharper read on emotions can have a significant impact on their work.

What AI can read, and what escapes it

On recognizing simple emotions, AI is good — sometimes better than we are. On fine-grained understanding, it drops off significantly. The EmoBench benchmark (Sabour et al., presented in 2024 at ACL, the leading research conference on language) was built to measure that gap: 400 hand-written scenarios, designed to demand genuine emotional reasoning.

The result is clear. On the understanding dimension, the best model tested, GPT-4, tops out at 59.8% correct answers. So it gets it wrong nearly one time in two. That's a lot.

Across a dozen models (GPT-4, GPT-3.5, Claude, Llama 2, and others), none reaches the average human, and above all, none ever surpasses people with high emotional intelligence. And those are the people you'll want tomorrow in your frontline management roles, for example. A team member who can decode their own emotions and those of their team, anticipate conflicts thanks to weak signals picked up with precision, and understand what's fundamentally at stake in those conflicts when they break out.

Take a typical EmoBench scenario: after a catastrophic day, Sam bursts out laughing when his car breaks down. What emotion is he feeling? AI, reading the "laughter" label, leans toward joy. A human understands that the laughter masks despair, and guesses why. Sam is cracking. He can't take any more, and believing he's having fun is dangerous.

For an employee, the danger isn't that AI stays quiet. It's that it answers with confidence. It'll hand you a "positive" read on the thrilled client or the satisfied team member. What's more, the tool will display the same confidence in both cases. This gap is measured on today's models; it may shift with the next ones. But as of today, ambiguous emotion remains human territory, and it doesn't improve with the latest models from Anthropic or OpenAI.

The use case: hiring someone who reads the room

You're hiring the manager of a small team of developers: a role where you have to be at the heart of your team, understanding each person's challenges in order to support them. Understanding what the other person is feeling is a real plus. The candidate will have access to ChatGPT, and won't hesitate to consult it when a conflict erupts in their team.

Two profiles apply. In the interview, both are warm, articulate, and tell a good story about a difficult situation they handled. The difference doesn't show up in conversation. Except that one of them will sense that a "satisfied" client is in fact on their way out, or that a silent team member is covering up anxiety — and they act first. The other reads the "satisfied" label, and deals with the fallout of the conflict when it arrives.

Without the right assessment	With the right assessment
You ask, "tell me about a conflict you defused." Both candidates answer well. They're prepared for it.	You confront the candidate with scientifically ambiguous scenarios — mixed emotions and hidden causes — and you watch whether they read the real emotion.
You don't see that they're trusting the surface signal, the way a sentiment-analysis tool would.	In fifteen minutes you see whether they tell the sincere "everything's fine" from the "everything's fine" that's cracking.
You hire them, and it's the field (lost clients, a team that goes silent) that reveals the blind spot.	You hire them having identified their strength, reassured that they'll have more weapons than others to handle the storms.

Understanding is only half of emotional intelligence. The other half — knowing what to do once the emotion is understood — is the subject of a twin test, emotional application, where AI does a little better but still stays below the human.

How do you assess this skill, and why is it hard?

Candidates prepare for their interviews, and it's genuinely hard to perform emotions that a candidate is supposed to identify. On top of that, standard HR tools measure fine-grained emotion reading poorly. Designing a scoring grid for emotions is fairly technical, and you'd have to calibrate it across a large number of candidates. Lucky for you, for our test, that's already done.

Our approach starts from a published foundation. The emotional understanding test builds on EmoBench, the corpus by Sabour et al., itself built on the emotional intelligence theories of Salovey, Mayer, and Goleman. We don't make a homemade test — we start from a protocol already proven by research.

Then we extend it. We replay the exercises against recent models (Claude, GPT, Gemini), under documented conditions, and we recalibrate items designed for AI so that they can be taken by humans. It's the only way to know where the gap sits now, not in 2024. Three caveats frame this kind of comparison: a test designed for humans doesn't necessarily measure the same thing in a model; AI scores age within a few months; and a model may have crossed paths with a public test during its training. Our numbers are dated reference points, not truths set in stone.

The test doesn't claim to predict a candidate's success in the role. But it measures one critical thing: their ability to identify the emotion actually felt and its cause. Interviews, situational exercises, and personality tests keep their full place alongside it.

Going further

Want to see the test from the inside?

The first option: take it yourself. A few scenarios, the real emotion to identify and its cause, then a comparison of your score with that of the AI models. Take the test in 15 minutes → The second: let's talk. Book a meeting and ask all your questions. Book a meeting →

On a related theme

If the angle speaks to you, two other skills AI handles poorly: abstract reasoning, where it took more than a million dollars of compute for an AI to match a human, in our article the skill that still resists AI, and sales negotiation, where the models know the theory and still lose money, in why AI loses at negotiation.

Frequently asked questions

Isn't AI already good at detecting emotions?

For simple, explicit emotions, yes. A recognition test even gave GPT-4 an emotional quotient that surpasses 89% of humans (Wang et al, 2023). But on a test that demands fine-grained understanding — mixed emotions and hidden causes — the same model drops back below the average human (EmoBench, 2024). Taking the test, you'll see that these are perfectly ordinary situations.

Does AI feel emotions?

The models have mapped emotions through their understanding of language. As a result, they can simulate an emotion precisely, but in a fairly caricatured way. Complex emotions remain an error-prone area, even for the most advanced AIs.

Can you trust the sentiment analysis of AI tools?

To rough out a large volume of messages, it's helpful. On ambiguous cases, it reads the surface label and misses the real state, with the same confidence as on the easy cases. A good practice is to give it examples and counter-examples. You can also train it to understand irony (especially when you're handling data from social media). Keep a human eye on the situations that matter: the important client, the colleague who's checking out, the decisive interview. Hired well, a person will deliver far greater value than any recent model.

Sources

Sabour, S., Liu, S., Zhang, Z., et al. (2024). EmoBench: Evaluating the Emotional Intelligence of Large Language Models. ACL 2024. arXiv:2402.12071. Link
Wang, X., et al. (2023). Emotional Intelligence of Large Language Models. arXiv:2307.09042. Link
Hu, et al. (2025). EmoBench-M: Benchmarking Emotional Intelligence for Multimodal LLMs. arXiv:2502.04424. Link
Salovey, P., & Mayer, J. D. (1990). Emotional Intelligence. Imagination, Cognition and Personality, 9(3), 185-211. Link