Implementation of SCD2 On Truncate-Load Table With No Unique Column

· Source: Data Engineering on Medium · Field: Technology & Digital — Data Science & Analytics, Cloud Computing & IT Infrastructure · Depth: Intermediate, long

Summary

This article details a method for implementing Slowly Changing Dimension Type 2 (SCD2) in a Databricks Silver Delta table, even when the source Bronze Delta table lacks unique identifiers or audit columns and is subject to daily truncate-load operations. The standard Change Data Feed (CDF) feature in Delta Lake is shown to be insufficient for tracking changes (inserts, updates, deletes) under a truncate-load strategy, as it registers all records as "insert" events after each truncation. The proposed solution involves creating three temporary views: one for records to be updated (marking old records as inactive), one for newly inserted records (both fresh and updated counterparts), and one for records to be marked as deleted (not present in the source). These views are then combined into a final dataset, which is used in a MERGE INTO operation to upsert data into the Silver Delta table, maintaining historical records with `eff_start_tms`, `eff_end_tms`, and `active_flag` columns.

Key takeaway

For Data Engineers managing Databricks Delta tables with truncate-load Bronze sources, relying solely on Change Data Feed for SCD2 is insufficient. You should implement a custom merge strategy using temporary views to explicitly identify and manage inserts, updates, and deletions, ensuring accurate historical data tracking in your Silver layer. This approach is critical for maintaining data integrity and auditability.

Key insights

SCD2 implementation on truncate-load Delta tables requires custom logic beyond standard Change Data Feed.

Principles

Method

Create temporary views for updates, new inserts (including updated counterparts), and deletions. Combine these into a final dataset, then use a MERGE INTO statement to apply changes to the Silver Delta table.

In practice

Topics

Best for: Data Engineer, Software Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Data Engineering on Medium.