Production LiteLLM on AWS EKS: High Availability with GitOps

· Source: Towards AI - Medium · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cloud Computing & IT Infrastructure · Depth: Advanced, long

Summary

A production-grade deployment of LiteLLM on AWS EKS, managed via ArgoCD for GitOps, addresses the complexities of multi-LLM provider environments. This architecture unifies access to over 100 LLM providers, offering automatic scaling, granular budget controls, and 99.9% uptime. The system, designed to handle 500-1000 requests per second with p95 latency under 150ms, integrates a database-enabled LiteLLM proxy, Horizontal Pod Autoscaler (HPA), PostgreSQL 17 StatefulSet, Redis for caching, and an internal AWS Application Load Balancer (ALB). Key design decisions include pod anti-affinity, equal resource requests and limits for predictable HPA behavior, and tuned health checks to accommodate Prisma migrations.

Key takeaway

For MLOps Engineers or AI Architects deploying an LLM gateway in a multi-provider cloud environment, this production-grade LiteLLM architecture on AWS EKS offers a proven blueprint. You should consider adopting its GitOps-driven approach with ArgoCD for auditable, self-healing deployments and leverage its specific HPA, caching, and database configurations to achieve high availability, cost efficiency, and predictable performance for 500-1000 requests/second.

Key insights

A robust LiteLLM deployment on AWS EKS with GitOps unifies LLM access, ensuring high availability and cost efficiency.

Principles

Method

Deploy LiteLLM on AWS EKS using ArgoCD, orchestrating components like HPA, PostgreSQL, Redis, and an internal ALB for high availability and cost optimization.

In practice

Topics

Code references

Best for: MLOps Engineer, AI Engineer, AI Architect

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Towards AI - Medium.