Rate Limiter System Design: Token Bucket, Leaky Bucket, Scaling

2025-10-14 · Source: ByteByteGo · Field: Technology & Digital — Software Development & Engineering, Cloud Computing & IT Infrastructure · Depth: Intermediate, medium

Summary

Rate limiter systems are crucial for protecting APIs like GitHub, Stripe, and AWS from overload, ensuring fair access by controlling client request rates. Key requirements include limiting requests based on configurable rules (e.g., 100 API requests per minute per user), rejecting excess requests with HTTP 429 and helpful headers, introducing minimal latency (under 3 millisecond P95), and maintaining high availability across multiple servers. While Fixed Window Counting is simple, it suffers from a critical flaw at window boundaries, allowing bursts of 200 requests in 20 seconds. The Token Bucket algorithm, an industry standard, solves this by allowing tokens to accumulate during quiet periods, enabling legitimate bursts while maintaining a sustained refill rate. Implementation options include client-side (unreliable), server-side (mixed logic), and middleware (dedicated service), with middleware often offering the best balance. A typical architecture involves a middleware service storing rules and token bucket states in Redis, handling requests atomically to prevent race conditions during scaling.

Key takeaway

For a DevOps Engineer or Software Architect designing API infrastructure, prioritize implementing rate limiting using the Token Bucket algorithm. This approach effectively manages request bursts while maintaining overall throughput, preventing system overload and ensuring fair resource access. You should deploy this logic as a dedicated middleware service, such as an API gateway, leveraging Redis for distributed state management and atomic operations to avoid race conditions when scaling your services.

Key insights

Token Bucket algorithm is the industry standard for robust API rate limiting, balancing simplicity and effectiveness.

Principles

Rate limiters protect APIs from overload and ensure fair access.
Atomic operations are essential for distributed counter consistency.
Middleware offers the best balance for rate limiting implementation.

Method

Implement rate limiting via a middleware service that fetches rules, stores token bucket states in Redis, and uses atomic operations for request processing and token decrement.

In practice

Use Redis for shared, fast, in-memory counter storage.
Configure token bucket capacity for burst tolerance.
Place rate limiting logic in an API gateway or reverse proxy.

Topics

Rate Limiting
System Design
Token Bucket Algorithm
Redis
API Gateway
Distributed Systems

Best for: AI Architect, Software Engineer, DevOps Engineer, IT Professional

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by ByteByteGo.