Understanding dynamic resource allocation in Kubernetes

2026-07-01 · Source: Cloud Native Computing Foundation · Field: Technology & Digital — Cloud Computing & IT Infrastructure, Artificial Intelligence & Machine Learning · Depth: Intermediate, long

Summary

Kubernetes Dynamic Resource Allocation (DRA), now generally available in v1.35, offers a refined approach to managing hardware resources like GPUs. This post, published on July 1, 2026, details its implementation using NVIDIA's maturing dra-driver-nvidia-gpu in a CNTUG Infra Labs environment featuring Kubernetes v1.35.3, Containerd 2.2.2, NVIDIA RTX A5000, and Tesla T10 GPUs. It demonstrates how to install the NVIDIA GPU Operator v26.3.1 and NVIDIA DRA Driver GPU v25.12.0, then explores practical scenarios: sharing a single GPU across containers, prioritizing specific GPU models (e.g., A5000 over T10) in deployments, requesting GPUs based on memory capacity (e.g., >20GiB), and configuring GPU Time Slicing for shared access.

Key takeaway

For AI/ML engineers or DevOps teams managing GPU-intensive workloads on Kubernetes, DRA in v1.35 provides significantly more flexible and precise GPU allocation than the legacy Device Plugin. You can declaratively specify GPU types, memory, or sharing strategies, enabling better resource utilization. Consider migrating existing GPU deployments to DRA for improved management and scaling, but be mindful of how rolling updates interact with ResourceClaimTemplates during transitions.

Key insights

Kubernetes DRA provides granular, declarative GPU allocation, surpassing older Device Plugin limitations.

Principles

DeviceClass categorizes available hardware.
ResourceSlice tracks node-specific device pools.
ResourceClaim/Template manage device requests.

Method

Install NVIDIA GPU Operator and DRA Driver, then define ResourceClaims or ResourceClaimTemplates using `exactly` or `firstAvailable` with CEL expressions for precise device selection.

In practice

Share a single GPU among multiple containers.
Prioritize specific GPU models (e.g., A5000).
Request GPUs based on memory capacity (e.g., >20GiB).

Topics

Kubernetes
Dynamic Resource Allocation
GPU Management
NVIDIA GPU Operator
ResourceClaim
ResourceClaimTemplate

Code references

kubernetes-sigs/dra-driver-nvidia-gpu

Best for: Machine Learning Engineer, AI Engineer, DevOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Cloud Native Computing Foundation.