How Zalando built a unified data foundation for AI and analytics on Databricks
Summary
This article details an architecture for managing the entire lifecycle of data metrics, emphasizing definition in YAML, automated validation, human review, and individual development environments. Metrics are defined in YAML files, capturing aggregation logic, table relationships, ownership, description, and formatting. A CI/CD pipeline automates validation for uniqueness, naming conformity (e.g., snake_case), and ownership. The architecture relies on dimensional modeling principles, building Metric Views on a Star Schema within a Lakehouse environment, which maps 1-to-1 with Fact tables and inherits attributes from conformed Dimension tables. This setup ensures security and compliance benefits from the underlying platform, such as Unity Catalog. The system aims for interoperability, making metrics available across Databricks Dashboards, AI-powered analysis tools like Genie, and external applications via standardized connectors. Metric Views are crucial for enhancing the accuracy of conversational AI analytics by providing governed business definitions, reducing SQL guesswork, and offering context-aware interpretations.
Key takeaway
For Data Engineers or MLOps Engineers building analytical platforms, adopting a metric lifecycle management architecture with YAML definitions and automated validation is critical. This approach, combined with a semantic layer of Metric Views, ensures data governance and significantly boosts the accuracy and trustworthiness of AI-powered analytics tools like Genie, reducing "time-to-insight" and preventing dashboard sprawl. You should prioritize integrating Metric Views with your conversational AI solutions to ground them in consistent, reliable business definitions.
Key insights
A governed semantic layer built on Metric Views drastically improves AI analytics accuracy and data interoperability.
Principles
- Define metrics in code (YAML).
- Enforce automated validation via CI/CD.
- Base Metric Views on dimensional models.
Method
Define metrics in YAML, validate via CI/CD, review with a 4-eyes principle, and deploy to individual development environments. Build Metric Views on a Star Schema for a unified semantic layer.
In practice
- Use YAML for metric definitions.
- Implement CI/CD for metric validation.
- Integrate Metric Views with conversational AI.
Topics
- Unified Data Foundation
- Metric Lifecycle Management
- Databricks Lakehouse
- Semantic Layer
- Conversational AI Analytics
Best for: Data Engineer, MLOps Engineer, Data Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Databricks.