Don’t upgrade your Airflow DB in-place - Sharing lessons from our Airflow 3 migration

· Source: Data Engineering on Medium · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Cloud Computing & IT Infrastructure · Depth: Intermediate, quick

Summary

A recent production migration from Apache Airflow 2.9 to 3.1 revealed that Airflow 3 is a significant rearchitecture, not a simple version upgrade, leading to a multi-day infrastructure project. Key changes include the removal of the webserver, replaced by `airflow api-server` (a FastAPI + React application), necessitating updates to startup scripts, health checks (from `/health` to `/api/v2/monitor/health`), and Dockerfiles. Additionally, the authentication system was completely rewritten, removing Flask-AppBuilder (FAB) and `airflow users create`. Airflow 3 now uses SimpleAuthManager, which reads user credentials from a flat JSON file, requiring this file to be generated at every container boot due to the ephemeral nature of containers.

Key takeaway

For MLOps Engineers planning an Airflow upgrade, recognize that Airflow 3.x is a rearchitecture, not a minor update. You must account for changes to the webserver command, health check endpoints, and a completely rewritten authentication system that relies on flat JSON files for user management. Plan for a multi-day infrastructure project, including script and Dockerfile updates, rather than a simple `pip install --upgrade`.

Key insights

Airflow 3 is a rearchitecture, not an incremental update, requiring significant infrastructure changes.

Principles

Method

Airflow 3 replaces the webserver with `airflow api-server` and shifts authentication to a flat JSON file for SimpleAuthManager.

In practice

Topics

Best for: MLOps Engineer, Data Engineer, DevOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Data Engineering on Medium.