Anthropic says stronger AI models cut better deals, and the losers don't even notice

· Source: The Decoder · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Intermediate, medium

Summary

Anthropic's "Project Deal" experiment, conducted in December 2025, involved 69 employees using Claude AI agents to autonomously negotiate and trade real goods on a Slack-based classifieds marketplace. Participants received a $100 budget, and their agents, either the more capable Claude Opus 4.5 or the smaller Claude Haiku 4.5, handled all aspects of buying and selling without human intervention until the final item exchange. The experiment revealed that Opus agents consistently secured better prices and closed more deals, averaging $3.64 more per item and closing two more deals than Haiku agents. Despite receiving objectively worse outcomes, Haiku users rated the fairness of their transactions and overall satisfaction almost identically to Opus users, highlighting a significant perception gap regarding AI-assisted decision-making. Anthropic notes this could lead to "invisible inequality" in real-world AI commerce.

Key takeaway

For CTOs and VPs of Engineering evaluating AI agent deployments for transactional or negotiation tasks, you must prioritize the underlying model's capability. Relying solely on user satisfaction metrics can mask significant disparities in outcomes, potentially leading to "invisible inequality" for users interacting with less powerful agents. Implement robust, objective performance benchmarks to ensure equitable and effective AI-driven interactions, especially in high-stakes commercial applications.

Key insights

Stronger AI models secure better deals, but users of weaker models often remain unaware of their disadvantage.

Principles

Method

Anthropic's "Project Deal" used parallel marketplaces with different Claude model strengths (Opus vs. Haiku) to conduct autonomous AI agent negotiations for real goods among employees, measuring deal outcomes and user satisfaction.

In practice

Topics

Best for: Research Scientist, CTO, VP of Engineering/Data, AI Scientist, AI Ethicist, Policy Maker

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by The Decoder.