OpenAI Scales Single Primary Postgresql to Millions of Queries per Second for ChatGPT

2026-02-12 · Source: InfoQ · Field: Technology & Digital — Software Development & Engineering, Artificial Intelligence & Machine Learning, Cloud Computing & IT Infrastructure · Depth: Advanced, quick

Summary

OpenAI has successfully scaled a single-primary PostgreSQL instance on Azure Database for PostgreSQL to handle millions of queries per second for ChatGPT and its API platform, serving 800 million users globally. This achievement involved extensive optimizations at both the application and database layers, including scaling up instance size, refining query patterns, and deploying nearly 50 geo-distributed read replicas. To manage write-intensive workloads, OpenAI reduced redundant writes, directed new write-heavy operations to sharded systems like Azure Cosmos DB, and implemented lazy writes. Operational challenges such as cache-miss storms and ORM-generated multi-table joins were addressed by moving computation to the application layer and enforcing stricter transaction timeouts. Connection pooling with PgBouncer and workload isolation further ensured stable performance under global traffic spikes.

Key takeaway

For engineering leaders managing high-scale, globally distributed services, OpenAI's approach demonstrates that a single-primary PostgreSQL can sustain massive read-heavy AI workloads. Your teams should prioritize application-level write reduction, judiciously offload write-intensive operations to sharded systems, and strategically deploy geo-distributed read replicas to maintain low latency and strong consistency, deferring the complexity of fully distributed PostgreSQL until absolutely necessary.

Key insights

A single-primary PostgreSQL instance can scale to millions of QPS with strategic application and database optimizations.

Principles

Optimize application and database layers concurrently.
Isolate critical workloads to prevent noisy neighbor effects.
Distribute reads across geo-distributed replicas.

Method

OpenAI scaled PostgreSQL by optimizing instance size, refining query patterns, using read replicas, reducing redundant writes, and offloading write-heavy workloads to sharded systems like Azure Cosmos DB.

In practice

Use PgBouncer for connection pooling in transaction mode.
Direct new write-heavy workloads to sharded databases.
Implement cascading replication for large replica counts.

Topics

PostgreSQL Scaling
Database Optimization
ChatGPT Infrastructure
Read Replicas
Distributed Databases

Best for: CTO, VP of Engineering/Data, Director of AI/ML, MLOps Engineer, AI Architect, Data Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by InfoQ.