Why Global Rate Limits Fail Multi-Tenant APIs

A single global bucket is easy to build and unfair in production. One noisy tenant can degrade the entire system. Per-tenant limits with burst control are the safer model.

1) Define tenant-scoped limit keys

function key(tenantId: string, route: string) {
  return `rl:${tenantId}:${route}`;
}

2) Use token bucket per key

type Bucket = { tokens: number; lastRefillMs: number };

function allow(b: Bucket, nowMs: number, ratePerSec: number, capacity: number): boolean {
  const refill = ((nowMs - b.lastRefillMs) / 1000) * ratePerSec;
  b.tokens = Math.min(capacity, b.tokens + refill);
  b.lastRefillMs = nowMs;
  if (b.tokens < 1) return false;
  b.tokens -= 1;
  return true;
}

3) Emit limit headers for client backoff

X-RateLimit-Limit: 60
X-RateLimit-Remaining: 12
Retry-After: 5

Failure pattern

  • One shared limit across all tenants.
  • Limit denials with no retry hint.
  • No metric split by tenant tier.

What to verify

  • Noisy tenants cannot starve others.
  • 429 responses include predictable retry behavior.
  • Dashboards show limit pressure by tenant and route.

Get New Tutorials by Email

No spam. Just clear, practical breakdowns you can apply right away.

Enjoy this tutorial?

Get new practical tech tutorials in your inbox.