4 tips for building better AI agents that your business can trust

· Source: News and Advice on the World's Latest Innovations | ZDNET · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Emerging Technologies & Innovation, Data Science & Analytics · Depth: Intermediate, medium

Summary

Thomson Reuters Labs, led by CTO Joel Hron, is actively integrating AI agents into its information services, utilizing both in-house models and off-the-shelf tools. The company aims to synthesize human expertise into judgment delivered via evolving mechanisms, increasingly through agents and agent-plus-software systems. Notable achievements include the AI-powered legal research tool Westlaw Advantage and a Deep Research agent. Hron emphasizes four key lessons for building trustworthy agentic AI: rigorous measurement of success through public and internal benchmarks, coupled with human expert validation; fostering collaboration between designers and data scientists to create common language and interfaces for human-AI interaction; extending agent capabilities by decomposing existing software applications into tools for agents; and engaging with external partners like the Trust in AI Alliance and Imperial College London to achieve high accuracy (99% and 99.9%) and ensure explainability and transparency.

Key takeaway

For AI/ML Directors evaluating agentic AI adoption, prioritize establishing robust evaluation frameworks that combine automated metrics with essential human expert validation. Your teams should focus on decomposing existing, proven software capabilities into agent-accessible tools to extend functionality, rather than expecting agents to be omniscient. Foster deep collaboration between UX designers and data scientists to ensure intuitive human-agent interfaces and shared understanding of agent operations.

Key insights

Building trustworthy AI agents requires rigorous measurement, cross-functional collaboration, and leveraging existing proven capabilities.

Principles

Method

Measure agent success using public benchmarks, internal benchmarks with defined "good" answers, and human expert validation. Foster collaboration between designers and data scientists to create shared human-agent interfaces.

In practice

Topics

Best for: VP of Engineering/Data, Director of AI/ML, AI Architect, CTO, AI Product Manager, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by News and Advice on the World's Latest Innovations | ZDNET.