Web Scraping at Scale: Adaptive Pacing Without Getting Blocked

Fixed-rate scraping fails in production because server tolerance changes over time. Adaptive pacing uses feedback signals to stay stable and respectful.

Step 1: Track per-domain response quality

stats = {
  "ok": 0,
  "timeouts": 0,
  "status_429": 0
}

Step 2: Adjust delay window based on recent errors

def next_delay(base, stats):
    penalty = stats["timeouts"] * 0.3 + stats["status_429"] * 0.6
    return min(8.0, base + penalty)

Step 3: Persist crawl cursor for resumability

state["last_url"] = current_url
save_state(state)

Pitfalls

  • Constant request interval regardless of server signals.
  • No checkpointing, forcing full restart after errors.
  • Ignoring Retry-After on throttled responses.

Validation

  • Throttle events decrease after adaptive pacing.
  • Interrupted jobs resume from saved cursor.
  • Error budget per domain stays within target.

Get New Tutorials by Email

No spam. Just clear, practical breakdowns you can apply right away.

Enjoy this tutorial?

Get new practical tech tutorials in your inbox.