From Data Engineering to AI Engineering: Where the Lines Blur
Summary
Tobias Macey, host of the Data Engineering Podcast, discusses how AI has profoundly reshaped data engineering since 2017, blurring the lines between data, ML, and AI engineering. The discipline, which emerged from the Hadoop and cloud warehouse eras to support data science, now grapples with increased unstructured data, new data assets like vector databases and knowledge graphs, and heightened reliability demands for interactive, user-facing AI applications. Key shifts include tighter cross-functional collaboration, faster dataset onboarding, evolving governance and access controls, and the critical integration of experimentation and evaluation into core testing practices. The podcast highlights how AI models are transforming data processing and the operational characteristics of data systems.
Key takeaway
For VPs of Engineering or Data leading AI initiatives, your teams must proactively integrate AI engineering practices into existing data engineering workflows. This means prioritizing the development of skills for managing unstructured data, adopting new data stores like vector databases, and establishing rapid experimentation and evaluation as core operational tenets. Your success hinges on fostering tighter collaboration between data, ML, and application engineering to meet the accelerated pace and stringent SLAs of AI-driven products.
Key insights
AI is blurring data engineering boundaries, demanding new data types, faster delivery, and integrated experimentation.
Principles
- Data engineering's core purpose remains transforming raw information into useful knowledge.
- Experimentation and evaluation are now fundamental testing practices across the data workflow.
- AI models can process unstructured data into structured assets.
Method
Data engineers must integrate language models and probabilistic technologies into traditionally deterministic workflows, manage new data assets like vector embeddings, and adapt to real-time SLAs for interactive AI applications.
In practice
- Utilize vector databases for AI model retrieval at inference time.
- Employ AI for processing unstructured data into structured formats.
- Implement robust experimentation for AI-driven data pipelines.
Topics
- AI Engineering
- Data Engineering Evolution
- Vector Databases
- Knowledge Graphs
- MLOps
Best for: CTO, VP of Engineering/Data, Director of AI/ML, Data Engineer, MLOps Engineer, AI Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Data Engineering Podcast.