AWS Glue Local Development Setup Using Podman on Windows (Step-by-Step Guide)

· Source: Data Engineering on Medium · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Software Development & Engineering · Depth: Intermediate, short

Summary

This guide details a step-by-step process for setting up a local AWS Glue PySpark development environment on Windows using Podman instead of Docker Desktop. The setup involves installing Windows Subsystem for Linux (WSL) with Ubuntu, followed by Podman and Podman Desktop. Key configuration steps include creating a Podman machine within WSL, enabling necessary extensions like Compose and Docker, and configuring proxy settings for corporate networks. The process culminates in pulling a specific AWS Glue container image, such as `public.ecr.aws/glue/aws-glue-libs:glue_libs_4.0.0_image_01`, and running it locally to enable testing of Glue jobs, PySpark scripts, and schema changes without repeated deployments to AWS. This approach offers benefits like rootless operation, no daemon requirement, and compatibility with corporate network restrictions.

Key takeaway

For Machine Learning Engineers or Data Engineers developing AWS Glue PySpark jobs on Windows, adopting this Podman-based local setup can significantly accelerate your development cycle. You can test transformations and validate schema changes instantly, reducing reliance on repeated AWS deployments and improving debugging efficiency. This setup is particularly beneficial in corporate environments with strict network policies or Docker Desktop licensing concerns.

Key insights

Podman provides a secure, lightweight, and open-source alternative to Docker Desktop for local AWS Glue PySpark development on Windows.

Principles

Method

Install WSL2 and Ubuntu, then Podman for Windows. Configure Podman Desktop to create a machine, enable extensions, set proxy, pull AWS Glue images, and run containers for local PySpark job testing.

In practice

Topics

Best for: Machine Learning Engineer, Data Engineer, Software Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Data Engineering on Medium.