AWS Glue Local Development Setup Using Podman on Windows (Step-by-Step Guide)
Summary
This guide details a step-by-step process for setting up a local AWS Glue PySpark development environment on Windows using Podman instead of Docker Desktop. The setup involves installing Windows Subsystem for Linux (WSL) with Ubuntu, followed by Podman and Podman Desktop. Key configuration steps include creating a Podman machine within WSL, enabling necessary extensions like Compose and Docker, and configuring proxy settings for corporate networks. The process culminates in pulling a specific AWS Glue container image, such as `public.ecr.aws/glue/aws-glue-libs:glue_libs_4.0.0_image_01`, and running it locally to enable testing of Glue jobs, PySpark scripts, and schema changes without repeated deployments to AWS. This approach offers benefits like rootless operation, no daemon requirement, and compatibility with corporate network restrictions.
Key takeaway
For Machine Learning Engineers or Data Engineers developing AWS Glue PySpark jobs on Windows, adopting this Podman-based local setup can significantly accelerate your development cycle. You can test transformations and validate schema changes instantly, reducing reliance on repeated AWS deployments and improving debugging efficiency. This setup is particularly beneficial in corporate environments with strict network policies or Docker Desktop licensing concerns.
Key insights
Podman provides a secure, lightweight, and open-source alternative to Docker Desktop for local AWS Glue PySpark development on Windows.
Principles
- Containerization enables consistent local development environments.
- Rootless containers enhance security and resource management.
Method
Install WSL2 and Ubuntu, then Podman for Windows. Configure Podman Desktop to create a machine, enable extensions, set proxy, pull AWS Glue images, and run containers for local PySpark job testing.
In practice
- Use `podman pull public.ecr.aws/glue/aws-glue-libs:glue_libs_4.0.0_image_01` for Glue 4.0.
- Mount local workspace with `-v /mnt/c/testing://glue_user/workspace`.
Topics
- AWS Glue
- Podman
- Local Development
- PySpark
- WSL
Best for: Machine Learning Engineer, Data Engineer, Software Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Data Engineering on Medium.