Computer Vision Preprocessing: The Missing 20% Before Model Training

#computer-vision, #data-quality, #ml, #preprocessing

Computer Vision Models Fail Quietly Before Training

People blame model architecture for poor vision results, but a large share of failures starts in preprocessing inconsistencies: crop policy drift, color normalization mismatch, and label noise.

Step 1: Normalize images through one deterministic pipeline

import cv2
import numpy as np

def preprocess(img, size=(224, 224)):
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    img = cv2.resize(img, size, interpolation=cv2.INTER_AREA)
    img = img.astype(np.float32) / 255.0
    return img

Step 2: Make label validation explicit

def validate_labels(rows):
    allowed = {"cat", "dog", "bird"}
    bad = [r for r in rows if r["label"] not in allowed]
    if bad:
        raise ValueError(f"invalid labels: {len(bad)}")

Step 3: Track dataset fingerprints

find data/train -type f -print0 | sort -z | xargs -0 shasum > data/train.sha1

Pitfalls

Different resize policies between training and inference.
Manual label edits with no audit trail.
No dataset fingerprinting before retraining.

Verification

Preprocess outputs are identical across repeated runs.
Invalid labels fail pipeline before training starts.
Dataset hash changes are visible in experiment logs.

AI & Machine Learning

One Big Mistake in AI Learning Apps: Shipping Features Without a Stable Review Loop

AI & Machine Learning Patreon

Advanced AI & Machine Learning Playbook: Retrieval freshness

AI & Machine Learning Patreon

Computer Vision Preprocessing: The Missing 20% Before Model Training

Computer Vision Models Fail Quietly Before Training

Step 1: Normalize images through one deterministic pipeline

Step 2: Make label validation explicit

Step 3: Track dataset fingerprints

Pitfalls

Verification

Related Post

One Big Mistake in AI Learning Apps: Shipping Features Without a Stable Review Loop

Advanced AI & Machine Learning Playbook: Retrieval freshness

Advanced AI & Machine Learning Playbook: Retrieval freshness

You missed

How to Build a Camera-to-AI Drawing Assistant That Feels Real-Time

One Big Mistake in AI Learning Apps: Shipping Features Without a Stable Review Loop

How to Build Camera-Aware Workout Tracking with watchOS Sync

How to Build an API-First App Release Workflow That Stays Reliable

Learn-IT-Free

Computer Vision Models Fail Quietly Before Training

Step 1: Normalize images through one deterministic pipeline

Step 2: Make label validation explicit

Step 3: Track dataset fingerprints

Pitfalls

Verification

Get New Tutorials by Email

Enjoy this tutorial?

Related Post

You missed