OpenAI Scales Single Primary Postgresql to Millions of Queries per Second for ChatGPT
Summary
OpenAI has successfully scaled a single-primary PostgreSQL instance on Azure Database for PostgreSQL to handle millions of queries per second for ChatGPT and its API platform, serving 800 million users globally. This achievement involved extensive optimizations at both the application and database layers, including scaling up instance size, refining query patterns, and deploying nearly 50 geo-distributed read replicas. To manage write-intensive workloads, OpenAI reduced redundant writes, directed new write-heavy operations to sharded systems like Azure Cosmos DB, and implemented lazy writes. Operational challenges such as cache-miss storms and ORM-generated multi-table joins were addressed by moving computation to the application layer and enforcing stricter transaction timeouts. Connection pooling with PgBouncer and workload isolation further ensured stable performance under global traffic spikes.
Key takeaway
For engineering leaders managing high-scale, globally distributed services, OpenAI's approach demonstrates that a single-primary PostgreSQL can sustain massive read-heavy AI workloads. Your teams should prioritize application-level write reduction, judiciously offload write-intensive operations to sharded systems, and strategically deploy geo-distributed read replicas to maintain low latency and strong consistency, deferring the complexity of fully distributed PostgreSQL until absolutely necessary.
Key insights
A single-primary PostgreSQL instance can scale to millions of QPS with strategic application and database optimizations.
Principles
- Optimize application and database layers concurrently.
- Isolate critical workloads to prevent noisy neighbor effects.
- Distribute reads across geo-distributed replicas.
Method
OpenAI scaled PostgreSQL by optimizing instance size, refining query patterns, using read replicas, reducing redundant writes, and offloading write-heavy workloads to sharded systems like Azure Cosmos DB.
In practice
- Use PgBouncer for connection pooling in transaction mode.
- Direct new write-heavy workloads to sharded databases.
- Implement cascading replication for large replica counts.
Topics
- PostgreSQL Scaling
- Database Optimization
- ChatGPT Infrastructure
- Read Replicas
- Distributed Databases
Best for: CTO, VP of Engineering/Data, Director of AI/ML, MLOps Engineer, AI Architect, Data Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by InfoQ.