Crosby Starts Contract Benchmark, Launches Agent Research Group
Summary
NewMod law firm Crosby has launched "Multi-turn Negotiation Bench" (Redline), a contract negotiation benchmark for AI outputs, alongside Crosby Intelligence, a research arm focused on legal agents. The benchmark evaluates frontier models on senior commercial lawyers' workflows, measuring negotiation as a sequence of judgment calls. Initial findings show ChatGPT 5.5 performing best at 50.5%, followed by Claude Fable 5 at 47.3%, Gemini 3.5 Flash at 45.1%, and Claude Opus 4.8 at 44.4%. Human lawyers still excel at finding "new routes to resolution," where AI models tend to get stuck. Crosby Intelligence aims to build "agentic attorneys" for the firm, releasing benchmarks for legal judgment domains and focusing on simulating negotiations to reduce time-to-signature from weeks to hours. Crosby has raised over \$85 million in funding.
Key takeaway
For Directors of AI/ML evaluating legal tech investments, Crosby's initiatives underscore that while AI models are improving, human judgment remains critical for complex contract negotiations. Your teams should focus on solutions that integrate AI for efficiency in high-volume, fixed-fee tasks, but ensure human oversight for "new routes to resolution." Consider benchmarking AI tools against multi-turn negotiation workflows to identify where human-in-the-loop processes are indispensable.
Key insights
Crosby's new benchmark and agent research highlight AI's current negotiation limits and future potential in legal workflows.
Principles
- Contract negotiation requires sequential judgment.
- AI excels at isolated edits, not "new routes."
- Automation boosts fixed-fee legal models.
Method
The "Multi-turn Negotiation Bench" measures AI performance in contract negotiation by evaluating a sequence of judgment calls, including understanding deal context, commercial leverage, legal edits, and anticipating counterparty responses.
In practice
- Integrate human judgment for complex negotiation.
- Explore agentic AI for high-volume contract work.
- Benchmark AI tools against specific legal workflows.
Topics
- Legal AI
- Contract Negotiation
- AI Benchmarking
- Legal Agents
- Law Firm Automation
- Generative AI
Best for: AI Engineer, NLP Engineer, Research Scientist, Legal Professional, AI Scientist, Director of AI/ML
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Lawyer.