4 tips for building better AI agents that your business can trust
Summary
Thomson Reuters Labs, led by CTO Joel Hron, is actively integrating AI agents into its information services, utilizing both in-house models and off-the-shelf tools. The company aims to synthesize human expertise into judgment delivered via evolving mechanisms, increasingly through agents and agent-plus-software systems. Notable achievements include the AI-powered legal research tool Westlaw Advantage and a Deep Research agent. Hron emphasizes four key lessons for building trustworthy agentic AI: rigorous measurement of success through public and internal benchmarks, coupled with human expert validation; fostering collaboration between designers and data scientists to create common language and interfaces for human-AI interaction; extending agent capabilities by decomposing existing software applications into tools for agents; and engaging with external partners like the Trust in AI Alliance and Imperial College London to achieve high accuracy (99% and 99.9%) and ensure explainability and transparency.
Key takeaway
For AI/ML Directors evaluating agentic AI adoption, prioritize establishing robust evaluation frameworks that combine automated metrics with essential human expert validation. Your teams should focus on decomposing existing, proven software capabilities into agent-accessible tools to extend functionality, rather than expecting agents to be omniscient. Foster deep collaboration between UX designers and data scientists to ensure intuitive human-agent interfaces and shared understanding of agent operations.
Key insights
Building trustworthy AI agents requires rigorous measurement, cross-functional collaboration, and leveraging existing proven capabilities.
Principles
- Define "good" for automated evaluations.
- Human experts are critical for final validation.
- Decompose existing software into agent tools.
Method
Measure agent success using public benchmarks, internal benchmarks with defined "good" answers, and human expert validation. Foster collaboration between designers and data scientists to create shared human-agent interfaces.
In practice
- Develop internal benchmarks for AI agent performance.
- Integrate human experts into the evaluation loop.
- Adapt existing software as tools for AI agents.
Topics
- AI Agents
- Trustworthy AI
- AI Evaluation
- Human-AI Collaboration
- Generative AI
Best for: VP of Engineering/Data, Director of AI/ML, AI Architect, CTO, AI Product Manager, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by News and Advice on the World's Latest Innovations | ZDNET.