I Improved My Model… But My Score Got Worse

2026-06-20 · Source: Machine Learning on Medium · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Intermediate, long

Summary

An analysis of the Ames Housing Competition investigates why model improvements often fail to generalize, despite showing better cross-validation (CV) scores. The study established a robust validation strategy using cross-validation before model building, employing tree-based models like CatBoost, LightGBM, HistGradientBoosting, and XGBoost as baselines. Initial feature engineering, including ordinal encoding, binary presence flags, and interaction features such as "HouseAge = YrSold - YearBuilt", consistently improved model performance. However, advanced techniques like log transformation for skewed numerical variables, category consolidation, and frequency encoding yielded mixed results, demonstrating that increased feature complexity does not guarantee better generalization. A logarithmic transformation of the target sales price improved model stability and XGBoost/HistGradientBoosting leaderboard scores, but not LightGBM or CatBoost. The findings emphasize that CV results alone are insufficient to predict generalization, as some simpler feature sets outperformed more complex configurations on unseen data.

Key takeaway

For data scientists building regression models on tabular data, prioritize a robust cross-validation strategy before extensive feature engineering. You should focus on domain-driven feature transformations, like ordinal encoding or interaction features, as these often generalize better than complex statistical methods. Be wary of solely optimizing for cross-validation scores, as they may not reflect true generalization; always evaluate against unseen data to prevent overfitting during feature engineering.

Key insights

Cross-validation improvements don't guarantee generalization; simpler, domain-driven features often outperform complex ones.

Principles

Validation strategy must precede model building.
Domain-driven features often generalize best.
Model complexity doesn't ensure better generalization.

Method

The process involved establishing cross-validation, building baseline tree models, applying initial domain-driven feature engineering, then advanced statistical feature engineering, and finally target variable transformation.

In practice

Use "ColumnTransformer" for consistent encoding.
Encode missing values to capture absence signals.
Apply ordinal encoding for naturally ranked categories.

Topics

Feature Engineering
Cross-Validation
Model Generalization
Gradient Boosting
Tabular Data
Target Transformation

Best for: Machine Learning Engineer, Data Scientist, AI Student

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning on Medium.