Timezone Bugs in Data Pipelines: Normalize at Ingest or Suffer Later

If one source sends local time and another sends UTC, downstream metrics drift silently. Normalize all timestamps at ingest and keep timezone metadata explicit.

Step 1: parse source timestamp with declared source timezone

from zoneinfo import ZoneInfo
from datetime import datetime

def parse_local(ts: str, tz: str) -> datetime:
    return datetime.fromisoformat(ts).replace(tzinfo=ZoneInfo(tz))

Step 2: convert immediately to UTC for storage

utc_dt = local_dt.astimezone(ZoneInfo("UTC"))

Step 3: store original timezone in audit column

row["source_tz"] = "America/New_York"
row["event_time_utc"] = utc_dt.isoformat()

Pitfall

Saving naive datetimes in warehouse tables and assuming everyone interprets them the same way.

Verification

  • DST transitions do not produce duplicate or missing hourly buckets.
  • All downstream transforms consume UTC fields only.
  • Audit columns trace original source timezone.

Get New Tutorials by Email

No spam. Just clear, practical breakdowns you can apply right away.

Enjoy this tutorial?

Get new practical tech tutorials in your inbox.