Why Global Rate Limits Fail Multi-Tenant APIs
A single global bucket is easy to build and unfair in production. One noisy tenant can degrade the entire system. Per-tenant limits with burst control are the safer model.
1) Define tenant-scoped limit keys
function key(tenantId: string, route: string) {
return `rl:${tenantId}:${route}`;
}
2) Use token bucket per key
type Bucket = { tokens: number; lastRefillMs: number };
function allow(b: Bucket, nowMs: number, ratePerSec: number, capacity: number): boolean {
const refill = ((nowMs - b.lastRefillMs) / 1000) * ratePerSec;
b.tokens = Math.min(capacity, b.tokens + refill);
b.lastRefillMs = nowMs;
if (b.tokens < 1) return false;
b.tokens -= 1;
return true;
}
3) Emit limit headers for client backoff
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 12
Retry-After: 5
Failure pattern
- One shared limit across all tenants.
- Limit denials with no retry hint.
- No metric split by tenant tier.
What to verify
- Noisy tenants cannot starve others.
- 429 responses include predictable retry behavior.
- Dashboards show limit pressure by tenant and route.