Linear Models Hiding Inside Your Decision Tree

2026-01-11 · Source: Agus’s Substack · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Advanced, medium

Summary

A novel method, DirectRS, extracts and utilizes the inherent geometric structure from decision trees to enhance classification performance and interpretability. The approach constructs a "stretch matrix" (S') from a tree's path structure, which encodes feature co-occurrence and importance. This matrix is then used to create geometry-aware embeddings for each data sample, which are subsequently fed into a multinomial logistic regression model. Applied to a 4-class vowel identification dataset with 5 acoustic features, DirectRS improved the Area Under the Curve (AUC) from 0.986 to 0.989 and reduced the Brier score by 37.8% (from 0.0982 to 0.0610) compared to a baseline decision tree. The method also revealed that F1 and F2 form an inseparable geometric unit, and F3, while having low boundary importance, is critical for within-leaf corrections.

Key takeaway

For AI Engineers and Research Scientists working with decision trees, consider applying the DirectRS method to extract and leverage the tree's intrinsic geometry. This approach can significantly improve model calibration and predictive accuracy, particularly for probability estimates, without complex gradient descent or extensive hyperparameter tuning. Furthermore, analyzing the derived stretch matrix (S') offers deeper insights into feature interactions and importance beyond traditional gain-based metrics, revealing hidden predictive signals within leaf nodes.

Key insights

Decision trees implicitly learn geometric feature relationships that can be explicitly extracted to improve model performance and interpretability.

Principles

Supervised models are linear in the correct representation space.
ℓ₂-regularized models are invariant to orthogonal transforms.
Feature importance can be decomposed into boundary and within-leaf contributions.

Method

DirectRS constructs a value-weighted co-occurrence matrix G from decision tree paths, computes its square root S' (the stretch matrix), and then fits multinomial logistic regression on embeddings formed by concatenating a per-leaf intercept and S'x.

In practice

Use DirectRS to improve decision tree probability calibration.
Analyze the G matrix to understand feature interactions.
Decompose feature importance into boundary vs. within-leaf roles.

Topics

DirectRS
Decision Tree Geometry
Feature Importance
Model Calibration

Best for: AI Engineer, AI Scientist, Research Scientist, AI Researcher, Machine Learning Engineer, Data Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Agus’s Substack.