Ensemble Learning in Practice: Why Your Second Model Should Disagree
If all models in an ensemble make the same mistakes, adding them does not improve generalization. The goal is controlled disagreement with measurable diversity.
Step 1: Train on different feature views
models = [
train_tree(X_basic, y),
train_tree(X_stat_features, y),
train_tree(X_sequence_features, y),
]
Step 2: Measure pairwise error correlation
def error_corr(e1, e2):
return np.corrcoef(e1.astype(int), e2.astype(int))[0, 1]
Step 3: Use weighted voting from validation data
weights = softmax(np.array([m.val_auc for m in models]))
Pitfalls
- Same preprocessing and same seed for every model.
- Choosing models by accuracy only, ignoring correlation.
- No holdout set for ensemble weighting.
Validation
- Ensemble outperforms best single model on holdout data.
- Pairwise error correlations stay below defined threshold.
- Weighting scheme remains stable across reruns.