Rate Limiting — Algorithms and Implementation

Rate limiting protects resources, controls cost, and reduces abuse. It looks simple, but in distributed systems decisions accumulate around accuracy, fairness, and UX. This article covers the fixed window, sliding window, token bucket, and leaky bucket algorithms, Redis-based implementations, options at other layers (CDN, proxy), and the shape of a 429 response.

1. Two axes

Two axes of rate limiting:

Accuracy — "Does it really not exceed N per period?"
Smoothness — "Even with bursts, do user-facing responses stay continuous?"

The two axes trade off. Algorithm choice fixes the balance.

2. Fixed Window

Maintain a counter for a fixed time bucket (for example, one minute):

key = "rl:user:42:" + (now / 60)
count = INCR key
EXPIRE key 60 NX
if count > limit: reject

Pros — simple. Cons — boundary effect (burst at the boundary). With a 50 per minute limit, 100 calls can pass between second 59 and second 01.

3. Sliding Window Log

Store each request's timestamp in a sorted set, drop entries older than one minute, and count:

ZADD rl:user:42 <ts> <ts>
ZREMRANGEBYSCORE rl:user:42 0 (now-60)
count = ZCARD rl:user:42

Pros — accurate. Cons — memory grows with request volume.

4. Sliding Window Counter

Take a weighted average of the previous and current windows. It compromises between fixed window's memory efficiency and sliding window's smoothness:

weight = (now - current_window_start) / window_size
estimate = previous_count * (1 - weight) + current_count

Often cited from Cloudflare's public write-up.

5. Token Bucket

A bucket with capacity N and refill rate r; requests consume tokens. If empty, reject or wait:

on request:
  tokens = min(capacity, tokens + (now - last) * rate)
  if tokens >= 1:
    tokens -= 1; allow
  else:
    reject
  last = now

Pros — burst tolerance plus average control. Adopted in many libraries and frameworks (Guava RateLimiter, NGINX limit_req with burst).

6. Leaky Bucket

A variant of the token bucket. Requests enter a queue and drain at a steady rate. The average is steady, but bursts are delayed in the queue or rejected. A familiar model from network traffic shaping.

7. Redis-based implementation

The simplest counter:

INCR rl:<key>:<window>
EXPIRE rl:<key>:<window> <ttl> NX
if count > limit: reject

NX ensures EXPIRE is set only once. Otherwise the TTL refreshes on every request, blurring the window boundary.

Use a Lua script for atomicity — prevents another command from slipping between INCR and EXPIRE:

local c = redis.call('INCR', KEYS[1])
if c == 1 then
  redis.call('EXPIRE', KEYS[1], ARGV[1])
end
return c

Sliding window with a Sorted Set:

redis.call('ZREMRANGEBYSCORE', KEYS[1], 0, ARGV[1] - ARGV[2])
local count = redis.call('ZCARD', KEYS[1])
if count >= tonumber(ARGV[3]) then
  return -1
end
redis.call('ZADD', KEYS[1], ARGV[1], ARGV[1] .. ':' .. ARGV[4])
redis.call('EXPIRE', KEYS[1], ARGV[2])
return count + 1

8. Libraries

Node — rate-limiter-flexible, @upstash/ratelimit (serverless friendly).
Java / Spring — Bucket4j, Resilience4j, Spring Cloud Gateway.
Python — limits, slowapi (FastAPI), django-ratelimit.
Go — golang.org/x/time/rate.

A vetted library is safer than rolling your own.

9. Where to enforce

CDN / edge — block costs as early as possible:

Cloudflare Rate Limiting — by IP, URL, or header.
AWS WAF Rate-based Rules — counts per IP.
Akamai, Fastly — similar features.

The strength of edge blocking is that traffic never reaches origin. The limit is fine-grained per-authenticated-user control is hard.

Reverse proxies:

NGINX limit_req / limit_conn — memory-based leaky bucket. burst option.
HAProxy stick-table — counting against varied keys.
Caddy rate_limit — provided as a module.

Application level — fine-grained control by business logic (user, plan, API key). Defaults to Redis or in-memory.

A two-layer combination (edge + application) is common — the edge protects broadly, the application enforces business rules.

10. HTTP 429

RFC 6585 (2012). The standard status code for telling the client they have made too many requests.

Headers to send along:

Header	Meaning
`Retry-After`	Seconds to wait until the next request, or an absolute time
`X-RateLimit-Limit`	The window's limit
`X-RateLimit-Remaining`	Remaining requests
`X-RateLimit-Reset`	Reset time (epoch)

The X-RateLimit-* headers are de facto standard but not yet a formal RFC (RFC 9210 draft and others in progress). Choose consistent names and document them.

11. Key choice and tiering

Key choice:

IP — default for anonymous users. Beware NAT and shared mobile carrier IPs.
User ID — for authenticated users.
API key — for machine callers.
IP + route — for sensitive endpoints like login.

Multiple keys at once (per-user + per-IP) gives layered protection.

Policy tiering:

Different limits for unauthenticated, authenticated, and paid plans.
Different limits per route (writes < reads).
Progressive backoff (short violation → short block, repeated → longer block).

12. Common pitfalls

Counter accuracy in distributed systems — when several servers handle requests for the same user, counters must aggregate in one place (Redis). In-memory counters multiply the limit by the number of nodes.

Clock drift — out-of-sync node clocks blur window boundaries. Sync via NTP.

Cache miss for time-keyed entries — fixed window keys expire automatically over time, but if the first request misses the EXPIRE, the key sticks forever. Use the NX option or Lua.

Missing burst tolerance — even normal users hit short bursts via page reloads. Too tight a limit damages UX.

Collateral damage on shared IPs — multiple users behind NAT get blocked together. Switch to a user key after authentication.

Missing response shape — if clients cannot distinguish 429 from a generic error, retry logic breaks. Retry-After and a standard format help.

Redis as SPOF — decide how the rate limit behaves when Redis is down (fail-open or fail-closed).

Hard to test — rate-limit behavior depends on time. Tests should abstract time or use very short windows.

Closing thoughts

Rate limiting can start as a simple counter, but operations slowly reveal trade-offs in accuracy, fairness, and UX. Redis Sliding Window Counter + Lua atomicity + tiered keys (user / IP / route) + standard 429 headers make a stable foundation when they sit together. Use vetted libraries; tests should use short windows.

input-validation-zod
password-hashing

We refer to Cloudflare blog — sliding window, Stripe Engineering — Rate Limiters, RFC 6585 (429), RFC 9210 draft RateLimit headers, NGINX limit_req, Bucket4j, rate-limiter-flexible, and @upstash/ratelimit.

Rate Limiting — Algorithms and Implementation

1. Two axes

Two axes of rate limiting:

Accuracy — "Does it really not exceed N per period?"
Smoothness — "Even with bursts, do user-facing responses stay continuous?"

The two axes trade off. Algorithm choice fixes the balance.

2. Fixed Window

Maintain a counter for a fixed time bucket (for example, one minute):

key = "rl:user:42:" + (now / 60)
count = INCR key
EXPIRE key 60 NX
if count > limit: reject

Pros — simple. Cons — boundary effect (burst at the boundary). With a 50 per minute limit, 100 calls can pass between second 59 and second 01.

3. Sliding Window Log

Store each request's timestamp in a sorted set, drop entries older than one minute, and count:

ZADD rl:user:42 <ts> <ts>
ZREMRANGEBYSCORE rl:user:42 0 (now-60)
count = ZCARD rl:user:42

Pros — accurate. Cons — memory grows with request volume.

4. Sliding Window Counter

Take a weighted average of the previous and current windows. It compromises between fixed window's memory efficiency and sliding window's smoothness:

weight = (now - current_window_start) / window_size
estimate = previous_count * (1 - weight) + current_count

Often cited from Cloudflare's public write-up.

5. Token Bucket

A bucket with capacity N and refill rate r; requests consume tokens. If empty, reject or wait:

on request:
  tokens = min(capacity, tokens + (now - last) * rate)
  if tokens >= 1:
    tokens -= 1; allow
  else:
    reject
  last = now

Pros — burst tolerance plus average control. Adopted in many libraries and frameworks (Guava RateLimiter, NGINX limit_req with burst).

6. Leaky Bucket

7. Redis-based implementation

The simplest counter:

INCR rl:<key>:<window>
EXPIRE rl:<key>:<window> <ttl> NX
if count > limit: reject

NX ensures EXPIRE is set only once. Otherwise the TTL refreshes on every request, blurring the window boundary.

Use a Lua script for atomicity — prevents another command from slipping between INCR and EXPIRE:

local c = redis.call('INCR', KEYS[1])
if c == 1 then
  redis.call('EXPIRE', KEYS[1], ARGV[1])
end
return c

Sliding window with a Sorted Set:

redis.call('ZREMRANGEBYSCORE', KEYS[1], 0, ARGV[1] - ARGV[2])
local count = redis.call('ZCARD', KEYS[1])
if count >= tonumber(ARGV[3]) then
  return -1
end
redis.call('ZADD', KEYS[1], ARGV[1], ARGV[1] .. ':' .. ARGV[4])
redis.call('EXPIRE', KEYS[1], ARGV[2])
return count + 1

8. Libraries

Node — rate-limiter-flexible, @upstash/ratelimit (serverless friendly).
Java / Spring — Bucket4j, Resilience4j, Spring Cloud Gateway.
Python — limits, slowapi (FastAPI), django-ratelimit.
Go — golang.org/x/time/rate.

A vetted library is safer than rolling your own.

9. Where to enforce

CDN / edge — block costs as early as possible:

Cloudflare Rate Limiting — by IP, URL, or header.
AWS WAF Rate-based Rules — counts per IP.
Akamai, Fastly — similar features.

The strength of edge blocking is that traffic never reaches origin. The limit is fine-grained per-authenticated-user control is hard.

Reverse proxies:

NGINX limit_req / limit_conn — memory-based leaky bucket. burst option.
HAProxy stick-table — counting against varied keys.
Caddy rate_limit — provided as a module.

Application level — fine-grained control by business logic (user, plan, API key). Defaults to Redis or in-memory.

A two-layer combination (edge + application) is common — the edge protects broadly, the application enforces business rules.

10. HTTP 429

RFC 6585 (2012). The standard status code for telling the client they have made too many requests.

Headers to send along:

Header	Meaning
`Retry-After`	Seconds to wait until the next request, or an absolute time
`X-RateLimit-Limit`	The window's limit
`X-RateLimit-Remaining`	Remaining requests
`X-RateLimit-Reset`	Reset time (epoch)

The X-RateLimit-* headers are de facto standard but not yet a formal RFC (RFC 9210 draft and others in progress). Choose consistent names and document them.

11. Key choice and tiering

Key choice:

IP — default for anonymous users. Beware NAT and shared mobile carrier IPs.
User ID — for authenticated users.
API key — for machine callers.
IP + route — for sensitive endpoints like login.

Multiple keys at once (per-user + per-IP) gives layered protection.

Policy tiering:

Different limits for unauthenticated, authenticated, and paid plans.
Different limits per route (writes < reads).
Progressive backoff (short violation → short block, repeated → longer block).

12. Common pitfalls

Clock drift — out-of-sync node clocks blur window boundaries. Sync via NTP.

Cache miss for time-keyed entries — fixed window keys expire automatically over time, but if the first request misses the EXPIRE, the key sticks forever. Use the NX option or Lua.

Missing burst tolerance — even normal users hit short bursts via page reloads. Too tight a limit damages UX.

Collateral damage on shared IPs — multiple users behind NAT get blocked together. Switch to a user key after authentication.

Missing response shape — if clients cannot distinguish 429 from a generic error, retry logic breaks. Retry-After and a standard format help.

Redis as SPOF — decide how the rate limit behaves when Redis is down (fail-open or fail-closed).

Hard to test — rate-limit behavior depends on time. Tests should abstract time or use very short windows.

Closing thoughts

input-validation-zod
password-hashing

Rate Limiting — Algorithms and Implementation

Rate Limiting — Algorithms and Implementation

1. Two axes

2. Fixed Window

3. Sliding Window Log

4. Sliding Window Counter

5. Token Bucket

6. Leaky Bucket

7. Redis-based implementation

8. Libraries

9. Where to enforce

10. HTTP 429

11. Key choice and tiering

12. Common pitfalls

Closing thoughts

Next

More in security

Rate Limiting — Algorithms and Implementation

Rate Limiting — Algorithms and Implementation

1. Two axes

2. Fixed Window

3. Sliding Window Log

4. Sliding Window Counter

5. Token Bucket

6. Leaky Bucket

7. Redis-based implementation

8. Libraries

9. Where to enforce

10. HTTP 429

11. Key choice and tiering

12. Common pitfalls

Closing thoughts

Next

More in security