Negotiation: AI knows Fisher & Ury, and loses anyway

LLMs recite negotiation theory, yet lose money on every deal. What that means for your salespeople, and how to test for it.

Ask Claude to explain the hunter/farmer dichotomy in sales: the answer comes back flawless and well structured, fifteen lines deep. Then ask it to negotiate buying a pair of headphones for 60 euros. It will accept a mediocre deal, or counter with an offer higher than the starting price.

This behavior has a name. It's called the knowing-doing gap, the difference between "knowing what to do" and "actually doing it," and it was studied at length twenty-five years before artificial intelligence emerged. In 2000, two Stanford researchers had already described it in human companies. Twenty-five years later, we find it intact in the most advanced language models. And it's probably the least-watched skill among salespeople who lean on AI.

Why negotiation becomes critical in 2026

A salesperson in 2026 no longer works alone. They have a copilot open on their second screen. Before the call, they ask ChatGPT for a strategy on their prospect. During the call, tools like Gong or Cresta suggest real-time follow-ups. After the call, an agent summarizes the conversation and proposes the next step. The skill of "negotiating" becomes, 80% of the time, "knowing what to accept and what to reject in whatever the tools whisper to me."

And the research is unequivocal. Models know negotiation theory like PhDs, but sell like middle schoolers. Bad ones, at that. Jeffrey Pfeffer and Robert Sutton, professors at the Stanford Graduate School of Business, named this mechanism as early as 2000. In their book The Knowing-Doing Gap, they sum it up: producing a sophisticated discourse about the problem delivers a cognitive satisfaction that lets you off the hook from acting. They were observing it in company executives.

What AI knows, and what it doesn't do

The number that says it all came from a Google DeepMind team in April 2025. In a study titled LLMs are Greedy Agents, the researchers asked models to reason about successive decisions and then act. The result is brutal: on the tasks studied, the reasoning the model stated was correct in 87% of cases. The action it actually chose was correct only 21% of the time.

In negotiation, this gap was measured as early as February 2024 by Huao Xia's team (Tsinghua, University of Pittsburgh, USA). In their study Measuring Bargaining Abilities of LLMs, they make models negotiate as buyers and as sellers. Dozens of products, calibrated opponents. First verdict: every model tested posts a negative net profit on every closed transaction. Not low. Negative. GPT-4, the best of the sample, doesn't even follow the basic rule given in its prompt ("don't buy anything above your budget"). The valid-response rate tops out at 42.7%.

A Stanford team (Bianchi, Chia, Jurafsky, Zou) quantified another symptom in NegotiationArena. When an LLM buyer is presented with a product priced at $30 that it values at $600, a reasonable human accepts immediately. GPT-4, on the other hand, makes a counteroffer higher than the initial price 41% of the time. It splits the difference even when that's absurd.

A more recent study (December 2025, LLM Rationalis?) showed the same blind spot in another form. In a negotiation with a Zone of Possible Agreement between $225,000 and $235,000 (Zone of Possible Agreement, or ZOPA), human negotiators anchor their offers around $229,500. In the middle of the zone. They signal that they've recognized the negotiating margin. LLM buyers, on the other hand, anchor uniformly at the floor: $225,000. The authors conclude that they are unable to infer the available strategic space.

On our own test, calibrated on May 4, 2026 against Claude Sonnet 4.5 and GPT-4o, the results converge. Across five successive seller-buyer role-play negotiations, Claude Sonnet 4.5 captures on average 51% of the potential margin gap. GPT-4o, 22%. The first human candidates tested, even non-salespeople, average 60%. A bad salesperson could sell better than the best of the AIs.

Now picture the scene. Your salesperson describes their prospect to ChatGPT (company, context, the term being negotiated, an estimate of the other side's BATNA) and asks for a strategy. They'll get a fluent, plausible answer, very elegant because it references technical sales concepts, and delivered with the certainty of an expert. Except the criteria the model is basing it on have no empirical reality. It's pure theory, nothing more. The AI is reciting a textbook. Your salesperson thinks they're consulting a negotiation expert with billions of sales under their belt; they're talking to a professor who has never sold a bike on Craigslist.

The use case: your best salesperson is the one who knows when to ignore the AI

Another concrete scene. You're an IT services firm and you're hiring a junior salesperson. The role involves B2B negotiations on six-figure contracts, facing professional buyers trained in every classic technique (anchoring, salami slicing, walk-away). The candidate will have access to internal tools: a conversational copilot, real-time suggestions during calls, a meeting-prep agent. They'll also have ChatGPT in their pocket to bail them out.

Two profiles show up. You assess them in an interview. Both have fully mastered the vocabulary of negotiation, never use negative words, even talk about ZOPA and BATNA without hesitation. Both have credible early experience on their résumés. But one knows when to follow the AI and when to ignore it. The other follows it blindly.

How do you tell them apart?

Without the right assessment	With the right assessment
You ask the classic questions ("tell me about a tough negotiation"). Both candidates give a fluent answer.	You put the candidate in a live situation facing a strategic buyer. You measure the margin captured, their anchoring, their resistance to early concessions.
You don't notice that they accept the first mediocre deal they're offered.	In twenty minutes you see that they hold firm against concessions made too early, that they anchor in the middle of the ZOPA, that they spot the bluffs.
You hire them with a blind spot. It'll be their first experiences that tell you what their real negotiating ability is.	You hire them with eyes open, with their coaching priorities identified from the first month.

How we test this skill

The scientific literature on negotiation as a test of human-AI complementarity is solid. Our approach rests on two foundations: proven paradigms drawn from academic research, and a systematic replay of the tests against the latest models.

First, we start from proven paradigms. The Bargain Buyer test is adapted from the protocol published by Xia et al. in 2024: five products negotiated successively, a maximum of six rounds per product, a normalized profit metric NP ∈ [0, 1]. This protocol has already been validated on dozens of models and hundreds of configurations in the research.

Next, we extend these studies by pitting the historical datasets against recent models. Before a test is presented to a human candidate, we replay it against Claude, GPT, Gemini, under documented and controlled experimental conditions. It's the only way to know where the human-AI gap crystallizes today, not in the 2024 literature. Our latest internal benchmarks date from May 2026 and will be re-measured with each generation of models.

The test doesn't claim to be enough to measure a candidate's overall sales performance. You could use a personality test to spot the profiles that maximize extraversion and conscientiousness (personality traits scientifically correlated with strong sales performance), run role-plays, or administer other sales simulations.

Our test offers an objective measure of the ability to negotiate a product: you see the real margin your buyer captured relative to the minimum price they could have gotten.

Going further

Want to try this test?

The first option: take the test yourself. Fifteen to twenty minutes facing an LLM seller, five products sourced from Amazon, a final score out of 100, and a comparison against AI and humans. So that candidates can't practice, the chance to take our tests is reserved for recruiters who have already subscribed to one of our plans. Try the tests →

The second: bring this test into your hiring process right now. Create a campaign →

On a related theme

On a related theme, do you want to test performance in an uncertain climate? Read our article dedicated to the "Adapting to uncertainty" skill: Slot Machines.

Frequently asked questions

Why is AI better in theory than in practice at negotiation?

Negotiation theory is massively represented in the models' training corpora: university textbooks, case studies, management articles. But turning that knowledge into sequential action, against a strategic adversary who adjusts, calls on capabilities that pre-training doesn't optimize for. It's the knowing-doing gap described by Pfeffer & Sutton in 2000, transposed to machines. There are so many choices and so many actions that seem relevant to the same situation that the AI trips over its own feet and falls flat almost every time.

So should you avoid using AI in negotiation altogether?

No. AI remains useful for preparing a negotiation (context analysis, background research), for summarizing afterward, for suggesting follow-ups. But it must not replace sales judgment during the negotiation itself.

What kinds of companies use this type of assessment?

This test is designed for any kind of company. The "human-AI complementarity" angle is still new, but tech companies are starting to adapt their hiring to account for AI. The competition (AssessFirst, PerformanSe) remains focused on detecting general skills, not on measuring the skills AI lacks. Doing so would require an intensive literature review that, for now, only Smarter Than AI carries out.

Sources

Pfeffer, J. & Sutton, R. I. (2000). The Knowing-Doing Gap: How Smart Companies Turn Knowledge into Action. Harvard Business School Press. Publisher page
Xia, H., et al. (2024). Measuring Bargaining Abilities of LLMs: A Benchmark and a Buyer-Enhancement Method. arXiv:2402.15813. Link
Bianchi, F., Chia, P. J., Yuksekgonul, M., Tagliabue, J., Jurafsky, D., Zou, J. (2024). NegotiationArena: A Benchmark for Autonomous Negotiation Agents. arXiv:2402.05863. Link
Google DeepMind. (2025). LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities. arXiv:2504.16078. Link
LLM Rationalis? Measuring Bargaining Capabilities of LLMs (2025). arXiv:2512.13063. Link