Computer Vision Models Fail Quietly Before Training
People blame model architecture for poor vision results, but a large share of failures starts in preprocessing inconsistencies: crop policy drift, color normalization mismatch, and label noise.
Step 1: Normalize images through one deterministic pipeline
import cv2
import numpy as np
def preprocess(img, size=(224, 224)):
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = cv2.resize(img, size, interpolation=cv2.INTER_AREA)
img = img.astype(np.float32) / 255.0
return img
Step 2: Make label validation explicit
def validate_labels(rows):
allowed = {"cat", "dog", "bird"}
bad = [r for r in rows if r["label"] not in allowed]
if bad:
raise ValueError(f"invalid labels: {len(bad)}")
Step 3: Track dataset fingerprints
find data/train -type f -print0 | sort -z | xargs -0 shasum > data/train.sha1
Pitfalls
- Different resize policies between training and inference.
- Manual label edits with no audit trail.
- No dataset fingerprinting before retraining.
Verification
- Preprocess outputs are identical across repeated runs.
- Invalid labels fail pipeline before training starts.
- Dataset hash changes are visible in experiment logs.