Production LiteLLM on AWS EKS: High Availability with GitOps
Summary
A production-grade deployment of LiteLLM on AWS EKS, managed via ArgoCD for GitOps, addresses the complexities of multi-LLM provider environments. This architecture unifies access to over 100 LLM providers, offering automatic scaling, granular budget controls, and 99.9% uptime. The system, designed to handle 500-1000 requests per second with p95 latency under 150ms, integrates a database-enabled LiteLLM proxy, Horizontal Pod Autoscaler (HPA), PostgreSQL 17 StatefulSet, Redis for caching, and an internal AWS Application Load Balancer (ALB). Key design decisions include pod anti-affinity, equal resource requests and limits for predictable HPA behavior, and tuned health checks to accommodate Prisma migrations.
Key takeaway
For MLOps Engineers or AI Architects deploying an LLM gateway in a multi-provider cloud environment, this production-grade LiteLLM architecture on AWS EKS offers a proven blueprint. You should consider adopting its GitOps-driven approach with ArgoCD for auditable, self-healing deployments and leverage its specific HPA, caching, and database configurations to achieve high availability, cost efficiency, and predictable performance for 500-1000 requests/second.
Key insights
A robust LiteLLM deployment on AWS EKS with GitOps unifies LLM access, ensuring high availability and cost efficiency.
Principles
- HPA scales predictably when resource requests equal limits.
- Pod anti-affinity enhances fault tolerance across nodes.
- GitOps provides declarative state, audit trails, and rollback.
Method
Deploy LiteLLM on AWS EKS using ArgoCD, orchestrating components like HPA, PostgreSQL, Redis, and an internal ALB for high availability and cost optimization.
In practice
- Set LiteLLM liveness probe initial delay to 60s for Prisma migrations.
- Configure HPA CPU target at 60% to maintain p95 latency under 150ms.
- Utilize Redis for API key validation and response caching to reduce database load.
Topics
- LiteLLM
- AWS EKS
- GitOps
- ArgoCD
- Kubernetes
- LLM Gateway
- High Availability
Code references
Best for: MLOps Engineer, AI Engineer, AI Architect
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Towards AI - Medium.