PIPELINE
Summary
The data engineering landscape has undergone a pivotal shift from traditional ETL (Extract, Transform, Load) to ELT (Extract, Load, Transform), driven by the economics of cloud computing. Historically, data was transformed pre-load due to high storage costs and limited compute resources. However, the advent of cloud services has rendered storage nearly free and compute elastic, removing these constraints. Despite this fundamental change, many practitioners continued using ETL out of habit, leading to inefficiencies. The article illustrates this with an example of a mobile application event data pipeline, designed to process user clicks, page views, and feature interactions, which ran for eight hours instead of its intended forty-five minutes, performing tasks like JSON parsing and hourly aggregation.
Key takeaway
For data engineers building or optimizing cloud-based data pipelines, the shift from ETL to ELT is critical. Your traditional ETL processes, which transform data before loading, are likely inefficient and costly given today's cheap cloud storage and elastic compute. You should re-evaluate existing long-running pipelines, prioritizing loading raw data directly into your cloud data warehouse before performing transformations to significantly improve performance and reduce operational overhead.
Key insights
Cloud economics make ELT superior to traditional ETL for modern data pipelines.
Principles
- Cloud storage is cheap, compute is elastic.
- Data transformation should follow loading in cloud.
- Re-evaluate habits against new technological realities.
In practice
- Migrate ETL pipelines to ELT in cloud.
- Load raw data directly into cloud storage.
- Re-architect long-running transformation jobs.
Topics
- Data Engineering
- ETL
- ELT
- Cloud Computing
- Data Pipelines
- Data Transformation
Best for: CTO, VP of Engineering/Data, Director of AI/ML, Data Engineer, Analytics Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Data Engineering on Medium.