Holding Kafka Right: Product-Friendly Streaming with TypeStream
Summary
Jevin Maltais, drawing on experience from Zapier, Humi, and Clio, discusses the practical challenges and common misuses of building reliable, product-focused streaming systems with Kafka. He advocates for leveraging Kafka's full potential, including events as a source of truth, materialized views via KTables, and the critical role of schema registries and type safety to prevent downstream issues. Maltais notes that many teams adopt Kafka without utilizing powerful features like Kafka Streams, Connect, or interactive queries. His project, TypeStream, addresses this by providing a config-as-code approach to make these advanced Kafka capabilities more accessible, maintaining a thin abstraction layer. The discussion also explores trade-offs among Kafka-compatible alternatives like RedPanda and AutoMQ, the real-world application of Change Data Capture with Debezium, and how TypeStream simplifies complex data pipelines, such as transforming Postgres data to Elasticsearch.
Key takeaway
For software engineers or data engineers building product-focused streaming systems, TypeStream offers a simplified path to leverage Kafka's advanced features without deep operational expertise. If you're struggling with Kafka's complexity or underutilizing its capabilities like KTables and Kafka Streams, consider TypeStream's config-as-code approach. This allows you to define robust data pipelines with type safety, reducing the need for custom microservices and enabling real-time data synchronization and materialized views more efficiently.
Key insights
Kafka's full potential, including Streams and KTables, is often underutilized despite its power for product-focused streaming.
Principles
- Events should serve as the singular source of truth.
- Schema registries and type safety prevent downstream breakage.
- Abstractions must include clear escape hatches for customization.
Method
TypeStream uses config-as-code (JSON) to define Kafka pipelines, compiling to Java Kafka Streams, managing topics, and deploying connectors for data transformation and syncing.
In practice
- Use Debezium to expose database changes as Kafka events.
- Materialize views with KTables for real-time queryable data.
- Integrate Kafka Connect for seamless data syncing to various targets.
Topics
- Kafka
- Streaming Data
- TypeStream
- Schema Registry
- Change Data Capture
- Data Pipelines
Best for: Data Engineer, Software Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Data Engineering Podcast.