I Pitted XGBoost Against Logistic Regression on 358 Matches. The Boring Model Won.

· Source: Towards Data Science · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Intermediate, medium

Summary

An analysis comparing five classifiers on 358 historical international football matches revealed that a simple logistic regression model achieved the best log-loss score of 1.001, outperforming more complex models like XGBoost, which scored 1.169 and performed worse than a uniform guessing baseline of 1.099. The experiment, using 5-fold cross-validation, aimed to predict win, draw, or away win outcomes based on team strength gap, combined strength, and a knockout flag. While logistic regression achieved 54% accuracy, XGBoost managed 48%. The surprising result is attributed to the bias-variance trade-off, where high-capacity models like XGBoost overfit on limited data, leading to confident miscalibration and higher log-loss. Logistic regression's inductive bias matched the data's linear log-odds relationship, requiring less data for stable estimates.

Key takeaway

For Machine Learning Engineers building predictive models on small or low-dimensional datasets, you should prioritize simpler models like logistic regression over complex ensembles. Your initial focus should be on establishing a strong baseline with proper scoring rules like log-loss, not just accuracy. Avoid over-parameterized models that can lead to confident miscalibration and worse performance than guessing. Only introduce complexity when learning curves empirically demonstrate its value for your specific data volume.

Key insights

Model complexity must match data availability and structure to avoid overfitting and miscalibration.

Principles

Method

Evaluate models using 5-fold cross-validation with log-loss as the primary metric, comparing against a uniform guessing baseline (e.g., ln(number of classes)).

In practice

Topics

Code references

Best for: Machine Learning Engineer, Data Scientist, AI Student

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Towards Data Science.