Every Supervised Machine Learning Model Is Linear

· Source: Agus’s Substack · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Advanced, medium

Summary

A new series proposes that plain linear models can surpass XGBoost on the California Housing dataset, achieving an R² > 0.83, by focusing on feature geometry rather than increasing model parameters. This approach, which uses geometric algebra, treats feature engineering as a principled operation to systematically construct features. The method avoids data leakage and target encoding, utilizing standard train/test splits. The core idea is that XGBoost implicitly learns geometric relationships within its tree structure, which can be explicitly extracted and reused. This allows linear models to capture the same complex interactions that typically give XGBoost its performance edge on tabular data, leading to superior or comparable results with enhanced interpretability and model compression.

Key takeaway

For Data Scientists and Machine Learning Engineers struggling with model interpretability or seeking performance gains beyond traditional feature engineering, consider exploring geometric algebra. This approach suggests that extracting and reusing the implicit geometric features learned by tree ensembles like XGBoost can lead to linear models that not only match but potentially exceed the ensemble's performance, offering significant interpretability and compression benefits. Your next step could involve investigating coordinate-free geometry and its application to feature construction.

Key insights

Feature engineering via geometric algebra can enable linear models to outperform complex ensembles like XGBoost.

Principles

Method

Extract geometric relationships (co-occurrence, importance, predictive regions) directly from a trained XGBoost model's tree structure, then use these to systematically construct features for a linear model.

In practice

Topics

Best for: Machine Learning Engineer, Data Scientist, AI Researcher

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Agus’s Substack.