Designing for “Future Me”: Docker Compose and AWS
Summary
This article details a Docker Compose setup designed for a data platform, emphasizing reproducibility and quick resumption of work after breaks. The author outlines a stack comprising eight long-running services like Postgres, Redis, and Airflow, alongside two "run-once" tasks for ingestion and dbt transformations. Key patterns include using Docker Compose profiles to differentiate continuous services from on-demand tasks, implementing explicit wait loops (e.g., pg_isready) to ensure service readiness beyond basic healthchecks, and leveraging ENTRYPOINT with CMD for flexible, single-purpose container invocations. The analysis also maps this open-source stack to AWS equivalents, noting that the "task vs. service" distinction becomes a critical cost-saving architectural decision in a cloud environment.
Key takeaway
For MLOps Engineers building data platforms, prioritizing development reproducibility is crucial for sustained productivity. Implement Docker Compose profiles to manage ephemeral tasks like data ingestion or dbt runs, preventing unnecessary resource consumption. Explicitly verify service readiness beyond basic healthchecks, such as using `pg_isready` for databases, to avoid flaky startup issues. This approach ensures your environment is consistently ready, minimizing setup time and maximizing focus on actual work.
Key insights
A well-structured Docker Compose setup ensures development reproducibility and rapid project resumption.
Principles
- Distinguish services from tasks.
- "Started" is not "ready."
- Lock container to one job, flexibly.
Method
Use Docker Compose profiles for task separation. Implement explicit pg_isready checks for database readiness. Define ENTRYPOINT for fixed executables and CMD for flexible arguments.
In practice
- Use `docker compose profiles` for one-off jobs.
- Add `pg_isready` loops for Postgres startup.
- Set `ENTRYPOINT ["dbt"]` for dbt containers.
Topics
- Docker Compose
- AWS Architecture
- Data Platform
- Reproducibility
- Container Orchestration
- MLOps
Code references
Best for: Machine Learning Engineer, MLOps Engineer, Data Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Data Engineering on Medium.