Feature Store Versioning for Reliable ML Training Runs
Training inconsistencies often come from untracked feature changes. A versioned feature store makes experiments reproducible across weeks and teams.
Step 1: Version feature definitions as immutable specs
FEATURE_SET_V3 = {
"user_age_days": "int",
"sessions_7d": "float",
"avg_watch_time": "float",
}
Step 2: Materialize with feature-set ID in output path
output = f"s3://ml/features/v3/date={run_date}/part-000.parquet"
Step 3: Persist training metadata with feature version
run_meta = {
"model": "retention_xgb",
"feature_set": "v3",
"data_cut": run_date,
}
Pitfall
Reusing the same feature path while mutating transformation logic. You cannot explain metric changes later.
Verification
- Model card includes exact feature set ID.
- Older runs can be replayed with same feature snapshot.
- Feature schema drift triggers alert before training.