Your Data, Your Lake: How Observe Uses Iceberg and Streaming ETL for Observability

2026-01-18 · Source: Data Engineering Podcast · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Cloud Computing & IT Infrastructure · Depth: Advanced, extended

Summary

Observe cofounder and CTO Jacob Leverich discusses applying lakehouse architectures to observability workloads, emphasizing cloud-native warehousing and open table formats like Iceberg for scalability and cost efficiency. He highlights how this approach, combined with streaming ingest via OpenTelemetry, Kafka-backed durability, curated/columnarized tables, and query orchestration, addresses common pain points such as fragmented tools, high costs, and data silos. The system delivers low-latency, interactive troubleshooting across logs, metrics, and traces at petabyte scale. Leverich also details the practicalities of organizing telemetry by use case to minimize read amplification and the significance of Iceberg v3's JSON shredding capabilities, enabling a "your data in your lake" strategy.

Key takeaway

For CTOs and AI Architects evaluating observability solutions, consider lakehouse architectures like Observe's approach. This strategy centralizes diverse telemetry data, reduces costs, and enhances troubleshooting by leveraging open table formats and streaming ETL. Your teams can gain unified access to petabytes of data, improving MTTR and enabling advanced AI-driven analytics without the typical constraints of fragmented, expensive legacy systems.

Key insights

Lakehouse architectures can provide scalable, cost-efficient observability by centralizing diverse telemetry data.

Principles

Organize data by use case to minimize read amplification.
Streaming ETL is crucial for low-latency observability.
Open table formats enable data ownership and multi-tool access.

Method

Ingest OpenTelemetry data via Kafka for durability and batching, then stream-process into curated, columnarized Iceberg tables. Abstract SQL queries into optimized sequences for interactive performance.

In practice

Deploy OpenTelemetry collectors for vendor-neutral data collection.
Utilize Kafka for buffering and efficient batch loading into lakehouses.
Curate data into specific tables (e.g., VPC flow logs) to optimize queries.

Topics

Observability
Lakehouse Architecture
Apache Iceberg
Streaming ETL
OpenTelemetry

Best for: CTO, VP of Engineering/Data, AI Architect, Data Engineer, MLOps Engineer, Software Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Data Engineering Podcast.