Maybe AI agents can be lawyers after all

· Source: AI News & Artificial Intelligence | TechCrunch · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Emerging Technologies & Innovation · Depth: Intermediate, quick

Summary

Anthropic's Opus 4.6 model has significantly improved AI agent performance on professional tasks, as measured by Mercor's APEX-Agents benchmark. Initially, AI agents scored under 25% on tasks like law and corporate analysis, leading to conclusions that human professionals were safe from immediate displacement. However, Opus 4.6 achieved nearly 30% in one-shot trials and an average of 45% with multiple attempts, marking a substantial increase from the previous 18.4%. This improvement is attributed partly to new agentic features, including "agent swarms," designed for multistep problem-solving. Mercor CEO Brendan Foody noted the rapid progress, indicating that foundation model development continues at a fast pace.

Key takeaway

For CTOs and VPs of Engineering assessing AI integration into professional workflows, the rapid performance gains of models like Anthropic's Opus 4.6 on benchmarks like APEX-Agents signal a need to re-evaluate AI's potential impact. Your teams should begin exploring advanced agentic features, such as "agent swarms," to understand their applicability to complex, multistep professional tasks within your organization, rather than dismissing AI for such roles based on older benchmarks.

Key insights

AI agent performance on professional tasks is rapidly improving, challenging previous assumptions about human job security.

Principles

Method

Mercor's APEX-Agents benchmark measures AI agent performance on professional tasks like law and corporate analysis using one-shot and multi-attempt trials.

In practice

Topics

Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Product Manager, Tech Journalist, Legal Professional

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by AI News & Artificial Intelligence | TechCrunch.