Queue Retries Without Chaos: Backoff + Dead-Letter Design

Retrying failed jobs blindly can amplify outages. A stable queue system needs bounded retries, exponential backoff, and dead-letter isolation.

Step 1: classify retryable vs terminal failures

function isRetryable(err: Error): boolean {
  return /timeout|503|rate limit/i.test(err.message)
}

Step 2: apply capped exponential delay

function nextDelay(attempt: number) {
  return Math.min(300, 2 ** attempt) // seconds
}

Step 3: route exhausted jobs to DLQ with context

{
  "job_id": "job-81",
  "attempts": 6,
  "last_error": "503 upstream timeout"
}

Pitfall

Infinite retries on non-retryable payload errors. The queue never drains and hides true incidents.

Verification

  • Retry count caps are enforced.
  • DLQ contains enough context for manual replay.
  • Queue lag recovers after upstream incidents.

Get New Tutorials by Email

No spam. Just clear, practical breakdowns you can apply right away.

Enjoy this tutorial?

Get new practical tech tutorials in your inbox.