How agent o11y differs from traditional o11y — Phil Hetzel, Braintrust

· Source: AI Engineer · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cloud Computing & IT Infrastructure, Software Development & Engineering · Depth: Intermediate, long

Summary

Phil Hetzel from Braintrust details how agent observability fundamentally differs from traditional observability, which primarily focuses on uptime and technical performance metrics like latency and error rates. Agent observability, in contrast, must contend with the non-deterministic nature of LLM agents, requiring qualitative metrics such as grounding, tool usage, and brand alignment. Agent traces are significantly more complex, semi-structured, voluminous (often over a gigabyte with 20MB spans), and fast-moving, necessitating specialized database designs for ingestion, indexing, and full-text search, exemplified by Braintrust's custom database utilizing a forked Tantivy index. Furthermore, effective agent observability involves diverse personas, including non-technical subject matter experts like clinicians or lawyers, who contribute to improving agent performance through natural language prompts and human annotation workflows. Braintrust is also developing LLM-driven topic modeling and sentiment analysis on traces to accelerate the iteration loop between identifying production problems and implementing fixes.

Key takeaway

For AI Engineers or AI Product Managers deploying generative AI agents, recognize that traditional observability tools are insufficient. You must adopt specialized agent observability platforms that handle non-deterministic behavior, process complex, voluminous trace data, and integrate feedback from non-technical domain experts. Prioritize solutions that offer robust indexing and full-text search capabilities to efficiently diagnose agent performance and accelerate your iteration cycles.

Key insights

Agent observability requires specialized approaches due to LLM non-determinism, complex trace data, and diverse stakeholder involvement.

Principles

Method

Braintrust developed a custom database with write-ahead logs, indexing, and a Tantivy-based full-text index to manage large, semi-structured agent traces for real-time and analytical queries.

In practice

Topics

Best for: AI Architect, AI Engineer, MLOps Engineer, AI Product Manager

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by AI Engineer.