Article: Two Misconfigurations That Caused Spark OOM Failures on Kubernetes

2026-06-03 · Source: InfoQ · Field: Technology & Digital — Cloud Computing & IT Infrastructure, Data Science & Analytics, Software Development & Engineering · Depth: Advanced, long

Summary

An article published on June 3, 2026, details how two infrastructure misconfigurations caused repeated Spark Out-Of-Memory (OOM) failures on Azure Kubernetes Service (AKS) for a financial institution's batch pipelines. After migrating from on-premises, a 3GB shuffle-intensive job, previously stable for years, began failing. The root causes were `spark.kubernetes.local.dirs.tmpfs=true`, which backed shuffle spill directories with node RAM (64GB nodes), and a hard `podAffinity` rule that forced all four Spark 3.4.0 executors onto a single node. Additionally, `tmp-volume` and `workdir` were undersized at 1Gi each. This combination exhausted node memory, leading to OOM kills. The fix involved setting `spark.kubernetes.local.dirs.tmpfs=false`, increasing volume sizes to 10Gi, and replacing `podAffinity` with `podAntiAffinity` to distribute executors. The system has been stable for six months since.

Key takeaway

For MLOps Engineers or Data Engineers migrating Spark workloads to Kubernetes, explicitly audit infrastructure configurations. Your cloud migration might silently introduce `tmpfs` for scratch directories or restrictive `podAffinity` rules, leading to OOM failures under production load. Validate `spark.kubernetes.local.dirs.tmpfs` is `false`, ensure sufficient disk-backed scratch volume sizes (e.g., 10Gi), and configure `podAntiAffinity` to prevent executor co-location and ensure stability.

Key insights

Cloud migrations can introduce silent infrastructure misconfigurations that cause Spark OOM failures under production load.

Principles

`spark.kubernetes.local.dirs.tmpfs=true` can exhaust RAM.
Hard `podAffinity` concentrates memory pressure.
Validate cloud storage and scheduling behavior.

Method

Diagnose Spark OOMs by checking `spark.kubernetes.local.dirs.tmpfs`, `podAffinity` rules, and scratch volume sizes. Use `podAntiAffinity` and disk-backed volumes for shuffle spill.

In practice

Set `spark.kubernetes.local.dirs.tmpfs` to `false`.
Increase `tmp-volume` and `workdir` to 10Gi+.
Use `podAntiAffinity` for executor distribution.

Topics

Spark on Kubernetes
OOM Failures
Cloud Migration
Kubernetes Scheduling
Shuffle Spill
Azure Kubernetes Service

Best for: Data Engineer, MLOps Engineer, DevOps Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by InfoQ.